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Preface 



The Java language began as a program for embedded devices, specifically con- 
sumer electronics. The driving concepts behind the language included portabil- 
ity, enabling reuse as the underlying processors were changed in new versions of 
the device; and simplicity, to keep the best aspects of related languages and to 
throw out the fluff. 

As the World Wide Web blasted onto the scene, the Java development team 
realized that with some additional functionality (specifically a GUI interface 
API) a language such as Java could readily be used to enable the specification of 
executable content on web pages. The inclusion of Java support in the Netscape 
2.0 browser provided enough publicity and support for the language that it 
immediately became the de facto standard language for programming executable 
content. 

Although Java has its roots in embedded systems and the web, it is important 
to realize that it is a fully functional high-level programming language that can 
provide users with a wide range of functionality and versatility. In “The Java 
Language: A White Paper,” Sun Microsystems developers describe Java as: 

Java: A simple, object-oriented, distributed interpreted, robust, se- 
cure, architecture neutral, aportable, high-performance, multithreaded, 

and dynamic language. 

This list of terms describes Sun’s design goals for the Java language and 
portrays some of the most important features of the language. Simplicity refers 
to a small language learning curve, similarity to C and C-|— 1-, and removal of some 
standard (but dangerous) features of languages such as C-|— 1-. For example, Java 
does not use pointers as in C and C-|— 1-, but rather references as in Pascal. This 
avoids potential pointer manipulation errors by the programmer. In addition, 
Java provides no header files as in C and C-|— 1-, thus enabling more automated 
bookkeeping (although compilers are not currently utilizing the full potential of 
such a system, resulting in sometimes awkward coding constructs). 

As an object-oriented language Java supports the concepts of abstract data 
typing as encapsulated in many object-oriented programming languages. The 
user defines a class which specifies a collection of data items and methods to 
operate on those items. Data and methods of the class can exist with the class 
(one instance per program execution) or with specific instances (objects) of the 
class. Classes are arranged in a hierarchy to provide mechanisms for inheritance 
and reuse. Java classes are further arranged into packages that provide some 
additional protection and bookkeeping for Java programmers. To assist the 
programmer, there exists a set of predefined classes that are provided with Java 
development/execution environments. Some of these classes are essential as 
they provide a bridge between the portable Java code and the underlying native 
operating system. 

As a distributed language, the Java API provides functionality for inter pro- 
cess communication and remote data access. It is important to understand that 
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this distributed nature of Java is solely the benefit of good API classes and not 
any inherent distributed or parallel programming capabilities. 

As an interpreted language, Sun means that the user creates an intermediate 
file containing the “bytecode” implementation of the program. It is the byte 
code that is subsequently interpreted and not the raw Java source code. The 
Java interpreter and the supporting run-time system implement what is called 
the Java Virtual Machine. 

As a robust language, Java is strongly typed and does not use pointers. These 
two features greatly reduce the possibility of very common software flaws. In 
addition, Java has both built-in automatic garbage collection routine to prevent 
memory leakage and exception handling. The exception handling allows almost 
all errors to be caught and managed by the software. 

As a secure language, Java provides for access control restrictions on class or 
object methods and data items. These may be implemented as part of the basic 
protection attribute of the items or through a run-time security monitor. 

As architecture neutral and portable, Java functionality does not rely on any 
underlying architecture specifics, thus allowing the code (or even the byte code) 
to be executed on any machine with a virtual machine implementation. 

As a high-performance language, Java is meant to execute well with respect 
to similar high-level languages. The use of special “just in time” compilers or 
other features may improve performance even more. 

As a multithreaded language, Java permits the development of user specified 
concurrent threads of control, as well as synchronization mechanisms to establish 
consistency between the users. 

As a dynamic language, Java is intended to be able to dynamically load code 
from the network and execute the new version of the code in the current virtual 
machine as opposed to recompiling the whole project. 

The above list of features describes Sun’s view of the Java language, a view 
that is shared by many users. We will assume that these features represent the 
base capabilities of the language. In the rest of this part we describe various 
features of Java, highlighting their challenges for formal method specifications 
of the language. 



Java Basic Data Types 

The Java language includes several built-in basic data types. These include 
boolean, char and numeric types: byte, short, integer, long for (8, 16, 32 and 64- 
bit integer calculations) and float and double for (32 and 64-bit floating point 
operations). Java provides standard operations for these types with the few 
special features discussed below. In addition Java has a reference data type for 
use of objects and a special reference data type, the array. Like Pascal and 
unlike C, the Java reference data type can only be copied; no increment or other 
operations can be performed on it. 

Java is strongly typed and only permits limited type casting or automatic 
conversions. This strengthens the reliability of the language. The only problems 
is that the explicit casting of integer values (with either 32 or 64-bits) to smaller 




Preface 



VII 



integers such as byte results in truncation of the high-order bits, resulting in 
information loss and even potential change in sign. 



Java Classes 

All user-defined Java data types are specified using a class definition. A class 
defines the fields and methods of the object and their appropriate access mod- 
ifiers. With Java 1.1 and later, users have the ability to define subclasses and 
anonymous classes within their own classes. 



Java Files and Packages 

A Java program consists of one or more packages, each of which consist of a 
collection of Java classes. A class within a single package has a stronger trust 
relationship with other classes in that package than with those outside of the 
package. In addition, the package relationship provides the ability to utilize a 
hierarchical naming convention for Java classes. 

Each Java class is defined within a single file. Although a file may contain 
more than one class definition, only one file in that class may be declared public 
(and must be named the same as the source file). Classes within the same file 
are implicitly within the same package. 



Exception Handling 

Java provides a flexible exception handling capability. Any time an exception 
occurs the violating routine can throw a named exception, abruptly terminating 
the statement. All Java statements can be encapsulated within a try-catch 
statement. If the enclosed statement throws an exception that is specified within 
the catch clause, the violating statement is terminated and the code in the catch 
clause is executed. Otherwise, the thrown exception is propagated up the call 
hierarchy. 



The Java Virtual Machine 

All Java programs are compiled into an intermediate form, the Java Byte Code. 
The Java Virtual Machine (JVM) reads and executes the byte code. In addition 
the JVM is responsible for downloading and verifying byte code from local and 
remote sources. The virtual machine checks access rights to class fields and 
methods, provides links to native code libraries and even implements security 
monitors for further limited access. 



Formal Methods 

“Formal methods” is a term that refers to the application of formal mathemat- 
ical models to computer systems and subsystems. The intent of this book is 
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to provide a forum for the presentation of a variety of approaches to formal 
specifications, execution models and analysis of Java programs. 

There are several styles of formal methods, a few of which are used in this 
book. The most common approach to specifying the meaning of a program 
currently in use is operational semantics. The purpose behind an operational 
semantics is to provide an abstract model of the internal state of the computer 
(as referenced by the program) and to specify the modifications of that state 
with respect to program statements and expressions. A typical semantic clause 
could be of the form: 



< c, (J >— >< c' , a' > 

< c; Cl, (T >— >< c'; Cl, cr' > 

This clause states that if partial execution of command c while in state ?, will 
result the remaining command c’ to be executed in state ?’, then the semantics 
of the sequential composition of c and cl will behave similarly. 

Another type of semantics used in this book is denotational semantics. In this 
form of semantics, each statement, expressions and other programming language 
constructs are mapped into functions. These functions are defined as mapping 
semantic domains to semantic domains. These domains may represent anything 
from the basic data values stored in variables to the effects of complex recursive 
functions on the state of the system. 
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1 Introduction 

This chapter presents an attribute grammar for the Java programming language 
(v. 1.1). This grammar is derived from the LALR grammar presented in the Java 
Language Specification (JLS) Q]. The purpose of this grammar is to formally 
specify not only the syntactic structure of Java programs, but also their static 
semantics. Specifically, in this chapter we try to formally capture all aspects of 
the language that would result in compile-time errors. These errors include, but 
are not limited to: 

— Type checking for assignment statements, ensuring that the type of the right- 
hand side of the statement is assignment compatible with the left hand side. 

— Type checking expression operands, ensuring that they are of compatible 
types. 

— Type checking method parameters, ensuring that they are the correct type 
and number. 

— Checking for duplicate variable and method names. 

— Checking for undefined variables. 

We do not actually capture all errors, but a sufficient body of them to demon- 
strate the approach we are using. We have left comments within the syntax for 
portions where we believe addition semantic checks are needed, as an exercise 
to the reader. The grammar is written using a BNF-like notation of the form of 
productions: 

<NonTerm> ::= expl 

semantic action 1 
I exp2 

semantic action 2 

where the left hand side (LHS) non-terminal <NonTerm> can be defined in 
terms of either right hand side (RHS) expression cxpl or cxp2. Within the pro- 
ductions we use some abbreviations to shorten the specification. We define the 
following abbreviations: cxp' to specify optional inclusion of the expression, cxp'^ 
to specify one or more occurrences of the expression, cxp* to specify zero or 
more occurrences of the expression. On the lines immediately following the RHS 
expression are the semantic actions for that production. These actions involve 
propagation of the attributes, up and down the parse tree, and static correctness 

Jim Alves-Foss (Ed.): Formal Syntax and Semantics of Java, LNCS 1523, pp. 1999. 
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Table 1. Language unit caller/callee relationship. 
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checks (compile-time errors). Associated with every potential compiler-time er- 
ror we have placed the semantic action ERROR which displays a string error 
message related to the compile time error. 



1.1 Logical Units of the Grammar 

The full grammar is broken down into several logical units, each consisting of 
a collection of productions that define non-terminals in the grammar. Table J 
depicts the hierarchical relationship between these units. A logical unit is said 
to call another logical unit if it uses a non-terminal of the other logical unit in 
the RHS of one of its productions. The called logical unit is the callee. Note that 
there are several self- and circular-references in this table. These logical units 
are defined as follows: 

names and literals - these define the lowest level constructs in the Java lan- 
guage and provide abstract representations of the low-level syntax of the 
Java language. 

packages - these define the overall structure of the Java source code files, pack- 
age and import specifications. 

types - these define the type definition facilities of Java, which includes primi- 
tive types, reference types, class types, array types and interface types, 
modifiers - these define the modifiers of various Java constructs. Such modifiers 
include protection modes (e.g., public, private) and status (e.g., static, final). 
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interface declarations - these define the form and structure of interfaces spec- 
ifications. 

class declarations - these define the form and structure of class specifications. 

method declarations - these define the form and structure of method speci- 
fications. 

field declarations - these define the form and structure of class and interface 
field specifications. 

variable declarations - these define the form and structure of local variable 
specifications. 

initializers - these define the initialization expressions for variable (including 
array) initializations. 

constructors - these define the constructor statements for classes. 

blocks and statements - these define the instruction and scoping constructs 
of the Java language. 

expressions - these define all expressions of the Java language. 



1.2 Attributes 

To specify the semantic aspects of the grammar, we define a set of attributes 
that are used during the traversal of the parse tree specified by the grammar. For 
simplicity sake, we define the attributes using Java field and method use notation 
(e.g., non . in . env defines the inherited environment from the non-terminal non) . 
An attribute is considered inherited if it is passed down from the non-terminal 
(root of the subtree) and it is synthesize if is created by the right-hand side 
expression (child nodes) . We assume that all inherited attributes are included as 
fields of the inherited object in, which is specified as a field of the non-terminal 
of the production. We assume that all synthesize attributes are included as fields 
of the synthesized object out which is specified as a field of the non-terminal of 
the right-hand side expression. 

This section describes the attributes of the grammar. The use of these at- 
tributes by the logical units of the language is as depicted in Table^ 

context This defines the code type being executed, whether it is a static or 
normal method, etc. This attribute is only inherited. The methods of this 
attribute are: 

— addPackage (name) which adds the specified package name to the current 
context. 

— addClass (name, mods) which adds the specified class name with its 
correct modifiers mods to the current context. 

— addinterf ace (name, mods) which adds the specified interface name 
with its correct modifiers mods to the current context. 

— addMethod(name, mods) which adds the specified method name with 
its correct modifiers mods to the current context. 

— addSwitchExpr ( type) stores the type of the current switch expression. 

— switchExprO returns the type of the current switch expression. 
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Table 2. Use of attributes in logical units of the language 
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— isInstanceMethodO which returns true if the current context is within 
an instance method. 

— isClassMethodO which returns true if the current context is within a 
class method. 

— isConstructor 0 which returns true if the current context is within a 
constructor method. 

— className 0 which returns the string representation of the current class. 

— getClassO which returns the reference of the current class. 

— get Super 0 which returns the reference of the super class of the current 
class. 

env This defines the “environment” of the program, basically the definition of 
all types, class fields and class definitions accessible by the current code. This 
attribute is inherited by code, but synthesized by the declarations aspects of 
the code. For a truly correct environment, the compiler must first parse all 
relevant declarations to build the top-level environment. Then the compiler 
can use this information in the second pass to evaluate expressions and 
statements. Without these two passes, all information must be declared prior 
to its use. To compress the presentation of the grammar in this chapter, we 
have combined the two passes of the compiler into one presentation and have 
greatly simplified the operations of the first pass of the compiler. The method 
newO defined below activates the first pass of the compiler and returns its 
results for the second pass. The methods of the env attribute are: 

— nev(CompUnit) which runs the first pass of the compiler on the code, 
producing a top-level environment which is used for the second pass of 
the compiler. In this environment are definitions of all classes, their fields 
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and methods, imported classes, and compiler defined environment infor- 
mation (e.g., classes defined in other files specified on the same command 
line to the compiler). This method, in effect, runs the attributed gram- 
mar by ignoring all error checks and returning the environment output 
by CompUnit. 

— typeCheckC tj/pei, type2) which returns true if type2 is of the type 
specified by typei. 

— lookupFieldType (re/, id) which returns the type of the field id from 
the reference ref. 

— lookupFieldValueCre/, i d) which returns the value of the field zd from 
the reference ref if that field is final and was initialized with a constant 
expression, otherwise it returns undef. 

— isDefined (name) which returns true if name is defined in the current 
environment. 

— idCheck (Primary Jj/pe, IdType) which returns true if IdType is unam- 
biguous and acceptable for PrimaryType. 

— isLabel (name) which returns true if the specified name is a current 
statement label. 

— addLabel (name) which stores name as a named label in the current 
environment. 

vars This defines the set of local variable declarations and their types. This 
attribute is typically only inherited, the exception being the local variable 
declaration statement which modifies this attribute synthesizing a new one. 

type This attribute is only synthesized to perform the necessary type checking. 
It is synthesized by variable declarations and expressions. The methods of 
this attribute are: 

— insert (item) which adds item to the list 

— equals (type) which returns true if the argument type is the same as 
the current attributes type. This is used by the typeCheck method of 
env. 

— promotableTo ( type) which returns true if the current attribute type is 
promotable to the argument type. 

— incO which takes the current array type, increments the number of 
dimensions and returns the new array type. If the current type is not an 
array type, this method creates a one-dimensional array of the current 
type. Note that in this method we do not keep track of the actual size 
of each dimension (that is a run-time check.) 

— inc ( num) which takes the current array type, increments the number of 
dimensions by num and returns the new array type. If the current type 
is not an array type, this method creates a nztm-dimensional array of 
the current type. Note that in this method we do not keep track of the 
actual size of each dimension (that is a run-time check.) 

value This attribute is synthesized from the low-level syntax of the language and 
is used to return the actual value associated with language literals, specif- 
ically identifier names and numeric, boolean, string, and character literals 
and the null constant. The methods of this attribute are: 
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— defined 0 returns true if the value is not undefined. 

— XX(value) [for XX one of LT, GT, GE, LE, EQ] returns true if the value 
compares correctly with the parameter value (e.g., the value is less than 
the parameter for operation LT), and false otherwise. 

— bitXX (t/alue) returns the numeric result of performing the specified 
bitwise operation (bit AND, bitOR or bitXOR) on the numeric value 
and the numeric parameter value. 

— bitNDTO returns the numeric result of performing the bitwise comple- 
ment operation on the numeric value. 

— XX(value) [for XX one of AND, OR or XOR] returns the boolean result 
of performing the specified logical operation (AND, OR or XOR) on the 
boolean value and the boolean parameter value. 

— NDT() returns the boolean result of performing the logical complement 
operation boolean value. 

— XX(value) [for XX one of LS, RSS, RSZ] returns the numeric result 
of performing the specified shift operation ( <<, >, or >>>) on the 
numeric value and the numeric parameter value. 

ids This attribute is only synthesized by variable declarations and is a list of 
declared variable ids. 

mods This is the list of modifiers for classes, methods, fields and interfaces. The 
methods of this attribute are: 

— exclusiveC list) which returns true if the attribute only contains mod- 
ifiers specified in list. 

— contains mod) which returns true if the attribute contains the modifiers 
specified by mod. 

— insert (mod) which adds mod to the list of modifiers. 

The following methods are part of the output attribute of a term. They are 
part of a specific output attribute, since they utilize results of more than one 
attribute. 

— assignableTo ( type) returns true if the current expression can be converted 
to the specified type by assignment conversion. 

— isExpression(name) returns true if the parameter name refers to a local 
variable or a field accessible in the current context. 

— getType (name) returns the type of the parameter name within the current 
context (or undef if the type is unresolvable) . 

— getValue (name) returns the value of the parameter name within the current 
context if name refers to a final variable who’s initializer was a constant 
expression, otherwise it returns undef. 

In addition, the following auxiliary functions are used in this grammar 

— binaryNumericConversion(tl , t2) which returns the resultant type after 
applying binary numeric conversion Q to the two argument types tl and 

t2. 
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— unaryNumericConversioii(type) which returns the resultant type after ap- 
plying unary numeric conversion | to the argument type. 

— mkArrayType ( tj/pe) which returns the type equivalent to an array of the 
parameter type. 

— unmkArrayType ( type) which returns the type equivalent to a single element 
of the array specified by the parameter type. 

2 The Grammar 

In this section we present the full attributed grammar, for each of the logical 
units of the language defined above. A brief discussion of the attributes of each 
logical unit is provided. 

2.1 Names and Literals 

The following grammar specifies the syntax of names and literals in the Java 
language. Specific formatting details of these are not presented here, but rather 
are assumed to be those defined in the Java Language Specification The 
name entity in this specification returns a string representation of the name 
that is used by the higher level production to determine the appropriate type. 
The resulting name/type is returned in the type attribute. Literals, on the other 
hand return the appropriate literal type in the type attribute. Integer literals 
also return a value in the value attribute that can be evaluated in the assignment 
statement. This permits a direct assignment of a small integer to shorts, chars 
and bytes. 



<Name> = 

<SimpleName> 

Name. out := SimpleName.out 
I <QualifiedName> 

Name. out QualifiedName.out 

<SimpleName>::= 

<Id> 

SimpleName.out. type := Id. out. value 

<QualifiedN ame > : : = 

<Name> . <Id> 

QualifiedName.type := Name.out.type-|-“.” -fid. out. value 

< Literal > ::= 

<IntLit> 

Literal. out. type := int 
Literal. out. value := IntLit. out. value 
I <FloatLit> 

Literal. out. type := float 
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Literal. out. value := FloatLit. out. value 
I <BoolLit> 

Literal. out. type := bool 

Literal. out. value := BoolLit. out. value 



I <CharLit> 

Literal. out. type := char 
Literal. out. value := CharLit. out. value 
I <StringLit> 

Literal. out. type java.lang. String 

Literal. out. value := StringLit. out. value 
I <NullLit> 

Literal. out. type := null 



2.2 Packages 

The following grammar defines the high-level file syntax of Java programs. Specif- 
ically this aspect of the grammar is responsible for defining package membership, 
class imports and the top-level class and interface specifications. It is important 
to remember that all of the type-checking performed within the method bodies is 
performed only after all of these top-level definitions are parsed in the first pass. 
All the attributes at this level are just passed up and down the parse tree with 
the only changes being made are: the name of the current package is placed into 
the context (if no package is defined, the current package is the default package) 
and class definition imports are added to the environment. 

Note that in this specification, there are some optional non-terminals on the 
RHS of the productions. The question arises as to how the attribute grammar 
handles the synthesized attributes of non-selected optional non-terminals. In this 
case, we adopt the convention that all synthesized-only attributes of an non- 
selected optional non-terminal are null, and that all inherited and synthesized 
attributes take on the value of the inherited attribute. 



<Goal> ::= 

<CompUuit> 

CompUnit.in.euv := env.new(<CompUuit>) 

CompUuit. in. context := new context)) 

<CompUnit> ::= 

<PackageDecl> ■ <ImportDeclList>’ <TypeDeclList> ’ 
PackageDecl.in := CompUuit. in 
ImportDeclList. in. context := 

CompUuit. in. context. addPackage(PackageDecl. out. type) 
ImportDeclList. in. env := PackageDecl.out.env 
TypeDeclList. in. context := ImportDeclList. in. context 
TypeDeclList.in.env := ImportDeclList. out. env 
CompUuit. out. env := TypeDeclList. out. env 
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<PackageDecl> ::= 
package <Name> ; 

PackageDecl. out. type := Name. out. type 

<ImportDeclList> ::= 

<ImportDecl> 

ImportDecl.in := ImportDeclList.in 
ImportDeclList.out.env ImportDecl.out.env 
I <ImportDeclListi> <ImportDecl> 

ImportDeclListi .in := ImportDeclList.in 
ImportDecl.in. context := ImportDeclList.in. context 
ImportDecl.in.env ;= ImportDeclListi .out. env 
ImportDeclList.out.env ImportDecl.out.env 

<TypeDeclList> ::= 

<TypeDecl> 

TypeDecl.in := TypeDeclList.in 
TypeDeclList.out.env := TypeDecl.out.env 
I <TypeDeclList> <TypeDecl> 

TypeDeclListi .in := TypeDeclList.in 
TypeDecl.in. context := TypeDeclList.in. context 
TypeDecl.in. env := TypeDeclListi .out. env 
TypeDeclList.out := TypeDecl.out.env 

<ImportDecl> ::= 

<SingleTypeImportDecl> 

SingleTypeImportDecl.in := ImportDecl.in 
ImportDecl.out.env SingleTypeImportDecl.out.env 
I <TypeImportOnDemandDecl> 

TypeImportOnDemandDecl.in ImportDecl.in 
ImportDecl.out.env := TypeImportOnDemandDecl.out.env 

<SingleTypeImportDecl> ::= 
import <Name> ; 

SingleTypeImportDecl.out.env : = 

SingleTypeImportDecl.in. env. import (Name. out .type) 

<TypeImportOnDemandDecl> ::= 
import <Name> . * ; 

SingleTypeImportDecl.out.env := 

SingleTypelmportDecl.in.env.importOnDemand(Name.out.type) 

<TypeDecl> ::= 

TypeDecl.out.env := TypeDecl.in.env 
I <ClassDecl> 

ClassDecl.in := TypeDecl.in 
TypeDecl.out.env := ClassDecl.out.env 
I <InterfaceDecl> 

InterfaceDecl.in := TypeDecl.in 
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TypeDecl.out.env ;= InterfaceDecl.out.env 



2.3 Types 

The following grammar presents the syntax of type definitions in Java. These 
productions simply pass back up the generated type of the term. If a reference 
type is expected, a compile-time check is made to ensure that the reference is 
defined, otherwise an error occurs. The same is true of array types. 



<Type> 

<PrimType> 

Type. out. type := PrimType. out. type 
I <RefType> 

Type. out. type := RefType. out. type 

<PrimType> 

<NumType> 

PrimType. out. type NumType. out. type 

I boolean 

PrimType. out. type := boolean 

<NumType> ::= 

<IntType> 

NumType. out. type IntType. out. type 

I <FloatType> 

NumType. out. type FloatType. out. type 

<IntType> ::= 

byte 

IntType. out. type := byte 

I short 

IntType. out. type := short 

I int 

IntType. out. type := int 

I long 

IntType. out. type := long 
I char 

IntType. out. type := char 

<FloatType> ::= 

float 

FloatType. out. type := float 

I double 

FloatType. out. type double 

<RefType> ::= 

<ClassInterfaceType> 
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RefType. out. type := ClassInterfaceType. out. type 
I <ArrayType> 

RefType. out. type := ArrayType. out. type 

<ClassInterfaceType> 

<Name> 

ClassInterfaceType. out. type := Name. out. type 
if not (ClassInterfaceType. in. env.isDefined(Name. out. type)) 
ERROR (“Undefined Name” + Name. out. type) 
ClassInterfaceType. out. type := null 

<ClassType> ::= 

<ClassInterfaceType> 

ClassType. out. type := ClassInterfaceType. out. type 

<InterfaceType> 

<ClassInterfaceType> 

InterfaceType. out. type := ClassInterfaceType. out. type 

<ArrayType> ::= 

<PrimType> [ ] 

ArrayType. out. type := mkArrayType(PrimType. out. type) 

I <Name> [ ] 

ArrayType. out. type : = 

mkArrayType( ArrayType. out. env.lookupType(Name. out. value)) 
I < ArrayType > [ ] 

ArrayType. out. type := mkArrayType( ArrayType. out. type) 

**** Type check these ** 



2.4 Modifiers 

The following grammar presents the syntax of modifiers, which return the mods 
attribute as a list of defined modifiers. These modifiers are used for classes, 
fields and methods in a Java file. It was decided to include all modifiers in 
a single grammatical structure here, and to perform restriction checking at a 
higher level; such as the illegal modification of an interface declaration with the 
volatile modifier. This structure does check for illegal duplicate modifiers, a 
condition that is not permitted in any use of modifiers in the Java language. 



< Modifiers > ::= 

< Modifier > 

Modifiers. out. mods := Modifer. out. mods 
I <Modifiers> <Modifier> 

if Modifiersi .out. mods. contains(Modifer. out. value) 

ERROR (“The modifiers should contain only one instance of” + 
Modifier . out .value) 
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Modifiers. out. mods := Modifiersi .out. mods 
else 

Modifiers. out. mods := Modifiers. out. mods. insert(Modifieri .out. value) 
endif 



< Modifier > ::= 

public 

Modifer . out .value 
I private 

Modifer. out. value 
I protected 

Modifer. out. value 
I static 

Modifer . out .value 
I abstract 

Modifer. out. value 

I final 

Modifer . out .value 

I native 

Modifer. out. value 
I synchronized 
Modifer. out. value 
I transient 

Modifer. out. value 
I volatile 

Modifer. out. value 



public 

private 

protected 

static 

abstract 

final 

native 

synchronized 

transient 

volatile 



2.5 Interface Declarations 

The following grammar presents the syntax for interface declarations. 



<InterfaceDecl> ::= 

<Modifiers>’ <UnmodInterfaceDecl> 

**** Modifiers abstract or public 

<UnmodInterfaceDecl> ::= 

interface <Id> <Extends>’ <InterfaceBody> 

<Extends>::= 

extends <InterfaceType> 

I <Extends> , <InterfaceType> 

<InterfaceBody> ::= 

{ <InterfaceMemberDeclList> ■ } 

<InterfaceMemberDeclList> ::= 

< InterfaceMemberDecl > 
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I <InterfaceMemberDeclList> <InterfaceMemberDecl> 

<InterfaceMemberDecl> ::= 

<ClassDecl> 

I <InterfaceDecl> | <ConstDecl> 

I <AbsMethodDecl> 

<ConstDecl> ::= 

<FieldDecl> 

**** Public, static and/or final. Field declaration in body of interface is all 3 

<AbsMethodDecl> ::= 

<MethodHdr> ; 



2.6 Class Declarations 

The following grammar presents class declarations. 



<ClassDecl> ::= 

<Modifiers> <UnmodClassDecl> 
if not(Modifiers. out. mods. exclusive([public, abstract, final])) 

ERROR “Classes may only be public, abstract and/or final”) 
endif 

if not(Modifiers. out. mods. contains(abstract) and 
Modifiers .out . mods . contains (final) ) 

ERROR (“Classes can not be both abstract and final”) 
endif 

UnmodClassDecl. in. context : = 

ClassDecl. context. addClassMods(Modifiers. out. mods) 
UnmodClassDecl. in. env ClassDecl. in. env 

<UnmodClassDecl> = 

class <Id> <Super>' <Interfaces> ' <ClassBody> 
let con = UnmodClassDecl. in. context. addClassName(Id. out. value) in 
let coni = con.addSuper(Super.out.type)in 
let con2 = coni. addlnterfaces(lnterfaces. out. type) in 
ClassBody.in. context := con2 
ClassBody.in.env := UnmodClassDecl. in. env 

<Super> ::= 

extends <ClassType> 

Super. out. type := ClassType. out. type 

<Interfaces> ::= 

implements <InterfaceTypeList> 

Interfaces. out. type := InterfaceTypeList. out. type 
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<InterfaceTypeList> ::= 

<InterfaceType> 

InterfaceTypeList. out. type := InterfaceType. out. type 
I < InterfaceTypeList >1 , <InterfaceType> 

InterfacesTypeList. out. type := 

InterfaceTypeList. out. type. insert (InterfaceTypei .out. type) 

<ClassBody> ::= 

{ <ClassBodyDecIList>’ } ClassBodyDecIList.in ClassBody.in 

ClassBody.out := ClassBodyDeclList.out 

<ClassBodyDecIList> ::= 

< ClassBodyDecI> 

ClassBodyDecl.in := ClassBodyDecIList.in 
ClassBodyDeclList.out ;= ClassBodyDecl.out 
I <ClassBodyDeclListi > <ClassBodyDecl> 

ClassBodyDeclListi .in := ClassBodyDecIList.in 
ClassBodyDecl.in := ClassBodyDeclListi .out 
ClassBodyDeclList.out := ClassBodyDecl.out 

<ClassBodyDecl> ::= 

<ClassDecl> 

ClassDec.in ClassBodyDecl.in 
ClassBudyDecl.out := ClassDecl.out 

**** Nested classes may be static, abstract, final, public, protected, or private ** 
I <InterfaceDecl> 

ClassDec.in ClassBodyDecl.in 
ClassBudyDecl.out := ClassDecl.out 
I <ClassMemberDecl> 

ClassMemberDecl.in ClassBodyDecl.in 
ClassBudyDecl.out ClassMemberDecl.out 
I <StaticInit> 

Staticinit.in ClassBodyDecl.in 
ClassBudyDecl.out := Staticinit.out 
I <ConstrDecl> 

ConstrDecl.in := ClassBodyDecl.in 
ClassBudyDecl.out := ConstrDecl.out 

<ClassMemberDecl> ::= 

<FieldDecl> 

FieldDecl.in ClassMemberDecl.in 
ClassMemberDecl.out := FieldDecl.out 
I <MethodDecl> MethodDecl.in ClassMemberDecl.in 
ClassMemberDecl.out := MethodDecl.out 
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2.7 Method Declarations 

The following grammar presents the syntax for class method declarations. 



<MethodDecl> ::= 

<MethodHdr> <MethodBody> 

<MethodHdr> ::= 

<Modifiers>^ <Type> <MethodDef> <Throws>’ 
I <Modifiers>’ void <MethodDef> <Throws>’ 

<MethodDef> ::= 

<Id> ( <FormalParmList>^ ) 

I <MethodDef> [ ] 

<FormalParmList> ::= 

<FormalParam> 

I <FormalParmList> , <FormalParam> 

<FormalParam> ::= 

<Modifier> <Type> <VarDeclId> 

**** Modifier may be final ** 

< Throws > ::= 

throws <ClassTypeList> 

<ClassTypeList> ::= 

<ClassType> 

I <ClassTypeList> , <ClassType> 

<MethodBody> ::= 

I <Block> 



2.8 Field and Variable Declarations 

The following grammar presents the syntax for class field declarations, and vari- 
able declarations. 



<FieldDecl> :: = 

<Modifiers>^ <Type> <VarDecl> ; 

**** Modifiers one of (public, protected, private) final, static, transient, volatile 



< VarDeclList > 
<VarDecl> 
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I <VarDeclList> , <VarDecl> 

<VarDecl> ::= 

<VarDeclId> 

I <VarDeclId> = <VarInit> 

*** Need Declared before used check here for Field inits*** 

<VarDeclId> ::= 

<Id> 

I <VarDeclId> [ ] 



2.9 Initializers 

The following grammar presents the syntax for variable and array initializers. 



<StaticInit> ::= 
static <Block> 

<VarInits> ::= 

<VarInit> 

I <VarInits> , <VarInit> 

*** Need Declared before used check here for Field inits*** 

<ArrayInit> 

{ <VarInits>^ / } 

<VarInit> ::= 

<Expr> 

I <ArrayInit> 

myline 

2.10 Constructor Declarations 

The following grammar presents the syntax for constructors. 



<ConstrDecl> ::= 

<Modifiers>^ <ConstrDef> <Throws>’ <ConstrBody> 
**** Modifiers one of public, provate, protected 

<ConstrDef> :: = 

<SimpleName> ( <FormalParmList>’ ) 

<ConstrBody> ::= 

{ <ExplConstrInv>’ <BlockStmtList>’ } 
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<ExplConstrInv> ::= 
this ( <ArgList>’ ) ; 

I super ( <ArgList>’ ) ; 



2.11 Blocks and Statements 

The following grammar presents the syntax for statements and blocks in the Java 
language. The pertinent attributes of blocks and statements are the environment 
{env) and local variable (vars) attributes. 



<Block> = 

{ <BlockStmtList>^ } 

BlockStmtList. in. context := Block. in. context 
BlockStmtList.in.env := Block. in.env 

BlockStmtList. in. vars := BlockStmtList. in. vars. newBlock() 
Block. out. vars := Block. in. vars 
Block. out. env := Block. in.env 

<BlockStmtList> ::= 

<BlockStmt> 

BlockStmt.in BlockStmtList. in 

BlockStmtList. out BlockStmt.out 
I <BlockStmtListi> <BlockStmt> 

BlockStmtListi .in := BlockStmtList. in 
BlockStmt.in. context BlockStmtList. in. context 

BlockStmt.in. vars := BlockStmtListi. out. vars 
BlockStmt.in. end := BlockStmtListi .out. env 
BlockStmtList. out := BlockStmt.out 

<BlockStmt> ::= 

< Local VarDeclStmt > 

LocalVarDeclStmt.in := BlockStmt.in 
BlockStmt.out LocalVarDeclStmt.out 

I <Statement> 

Statement. in := BlockStmt.in 
BlockStmt.out := Statement. out 
I <UnmodClassDecl> 

UnmodClassDecl.in BlockStmt.in 
BlockStmt.out := UnmodClassDecl.out 

<LocalVarDeclStmt> ::= 

< Local VarDecl> ; 

Local VarDecl. in := LocalVarDeclStmt.in 
LocalVarDeclStmt.out := Local VarDecl. out 

< Local VarDecl > = 

<Type> <VarDeclList> 
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Type. in := LocalVarDecl.in 
VarDeclList.in := LocalVarDecl.in 
LocalVarDecl.out.vars := 

LocalVarDecl. in. vars. insert (Type. out. type, VarDeclList.out.ids) 
if DeclConflict (Type. out. type, VarDeclList.out.ids, LocalVarDecl. in. vars) 
ERROR (“Illegal Local Variable Declaration”) 
endif 

LocalVarDecl. out. env := LocalVarDecl. in. env 

<Statement> ::= 

<StmtNoTrailing> 

StmtNoTrailing.in := Statement. in 
I <LabeledStmt> 

LabeledStmt.in Statement. in 
I <IfStmt> 

IfStmt.in Statement. in 
I <I£ElseStmt> 

IfElseStmt.in := Statement. in 
I <WhileStmt> 

WhileStmt.in := Statement. in 
I <ForStmt> 

ForStmt.in := Statement. in 

<StmtNoShortIf> ::= 

<StmtNoTrailing> 

StmtNoTrailing.in := StmtNoShortIf.in 
I <LabeledStmtNoShortIf> 

LabeledStmtNoShortIf.in := StmtNoShortIf.in 
I <IfElseStmtNoShortIf> 

IfElseStmtNoShortIf.in := StmtNoShortIf.in 
I <WhileStmtNoShortIf> 

WhileStmtNoShortIf.in StmtNoShortIf.in 
I <ForStmtNoShortIf> 

ForStmtNoShortIf.in := StmtNoShortIf.in 

<StmtNoTrailing> 

<Block> 

Block. in StmtNoTrailing.in 
I < Empty Stmt > 

EmptyStmt.in := StmtNoTrailing.in 
I <ExprStmt> 

ExprStmt.in StmtNoTrailing.in 
I <SwitchStmt> 

SwitchStmt.in := StmtNoTrailing.in 
I <DoStmt> 

DoStmt.in := StmtNoTrailing.in 
I <BreakStmt> 

BreakStmt.in := StmtNoTrailing.in 
I <ContStmt> 

ContStmt.in StmtNoTrailing.in 
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I <RetStmt> 

RetStmt.in := StmtNoTrailing.in 
I <SynchStmt> 

SynchStmt.in := StmtNoTrailing.in 
I <ThrowStmt> 

ThrowStmt.in := StmtNoTrailing.in 
I <TryStmt> 

TryStmt.in := StmtNoTrailing.in 
< Empty Stmt > ::= 



<LabeledStmt> :: = 

<Id> : <Statement> 

Statement. in. context := LabeledStmt. in. context 
Statement. in. vars := LabeledStmt. in. vars 
if not (LabeledStmt. in. env.isLabel(Id. out. value)) 

ERROR) “Label ”+ld.out.value+“ already in use.”) 

Statement. in. env := LabeledStmt. in. env 
else 

Statement. in. env LabeledStmt. in. env. addLabel(ld. out. value) 

endif 

<LabeledStmtNoShortlf> ::= 

<ld> : <StmtNoShortlf> 

StmtNoShortlf.in. context := LabeledStmtNoShortlf. in. context 
StmtNoShortlf.in.vars := LabeledStmtNoShortlf.in.vars 
if not (LabeledStmtNoShortlf. in. env. isLabel(ld. out. value)) 

ERROR) “Label ”+ld.out.value+“ already in use.”) 

StmtNoShortlf.in. env LabeledStmtNoShortlf.in.env 

else 

StmtNoShortlf.in. env := LabeledStmtNoShortlf.in.env.addLabel(ld. out. value) 
endif 

<ExprStmt> ::= 

<Assign> 

Assign. in := ExprStmt.in 
I <PrelncExpr> 

PrelncExpr.in := ExprStmt.in 
I <PreDecExpr> 

PreDecExpr.in := ExprStmt.in 
I <PostlncExpr> 

PostlncExpr.in ExprStmt.in 
I <PostDecExpr> 

PostDecExpr.in ExprStmt.in 
I <Methodlnv> 

Methodlnv.in := ExprStmt.in 
I <ClasslnstCreationExpr> 

ClasslnstCreationExpr.in := ExprStmt.in 




20 



Jim Alves-Foss and Deborah Frincke 



<IfStmt> ::= 

if ( <Expr> ) <Statement> 

Expr.in IfStmt.in 

Statement. in IfStmt.in 

if not (Expr. out. type. equals(boolean)) 

ERROR( “Condition of if statement must be boolean”) 
endif 

<IfElseStmt> ::= 

if ( <Expr> ) <StmtNoShortIf> else <Statement> 

Expr.in := IfElseStmt.in 
Statement. in IfElseStmt.in 

StmtNoShortIf.in := IfElseStmt.in 
if not (Expr. out. type. equals(boolean)) 

ERROR( “Condition of if statement must be boolean”) 
endif 

<IfElseStmtNoShortIf> = 

if ( <Expr> ) <StmtNoShortIf> else <StmtNoShortIf> 

Expr.in := IfElseStmtNoShortIf.in 
Statement. in IfElseStmtNoShortIf.in 

StmtNoShortIf.in := IfElseStmtNoShortIf.in 
if not (Expr. out. type. equals(boolean)) 

ERROR( “Condition of if statement must be boolean”) 
endif 

<SwitchStmt> ::= 

switch ( <Expr> ) <SwitchBlock> 

Expr.in SwitchStmt.in 
SwitchBlock.in.env ;= SwitchStmt.in.env 
if not (Expr. out. type. equals(integral)) 

ERROR( “Switch statement expression must be integral”) 

SwitchBlock. in. context := SwitchStmt. in. context. addSwitchExpt(int) 
else 

SwitchBlock. in := SwitchStmt. in. context. addSwitchExpfr (Expr. out. type) 
endif 

<SwitchBlock> ::= 

{ <SwitchBlockStmtList>’ <SwitchLabelList>’ } 

SwitchBlockStmtList.in := SwitchBlock. in 
SwitchLabelList.in := SwitchBlock. in 

<SwitchBlockStmtList> ::= 

<SwitchBlockStmt> 

SwitchBlockStmt.in := SwitchBlockStmtList.in 
I <SwitchBlockStmtListi> <SwitchBlockStmt> 

SwitchBlockStmtListi .in := SwitchBlockStmtList.in 
SwitchBlockStmt.in SwitchBlockStmtList.in 
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<SwitchBlockStmt> ::= 

<SwitchLabelList> <BlockStmtList> 

SwitchLabelList.in := SwitchBlockStmt.in 
BlockStmtList.in := SwitchBlockStmt.in 

<SwitchLabelList> ::= 

<SwitchLabel> 

SwitchLabel.in := SwitchLabelList.in 
I <SwitchLabelListi> <SwitchLabel> 

SwitchLabelListi .in SwitchLabelList.in 
SwitchLabel.in := SwitchLabelList.in 

<SwitchLabel> ::= 
case <ConstExpr> 

ConstExpr.in := SwitchLabel.in 

if not(ConstExpr.out.assignableTo(SwitchLabel.in.context.switchExpr())) 
ERROR(“Case label must be compatible with switch expression type.”) 
endif 

I default : 

<WhileStmt> ::= 

while ( <Expr> ) <Statement> 

Expr.in WhileStmt.in 
Statement. in := WhileStmt.in 
if not (Expr. out. type. equals(boolean)) 

ERROR) “While statement expression must be boolean”) 
endif 

<WhileStmtNoShortIf> ::= 

while ( <Expr> ) <StmtNoShortIf> 

Expr.in WhileStmt.in 
StmtNoShortIf.in ;= WhileStmt.in 
if not (Expr. out. type, equals(boolean)) 

ERROR) “While statement expression must be boolean”) 
endif 

<DoStmt> ::= 

do <Statement> while ( <Expr> ) 

Expr.in DoStmt.in 

Statement. in DoStmt.in 

if not (Expr. out. type. equals(boolean)) 

ERROR) “Do statement expression must be boolean”) 
endif 

<ForStmt> ::= 

for ( <ForInit>’ ; <Expr>^ ; <ForUpdate>’ ) <Statement> 

Forinit.in := ForStmt.in 

Expr. in. context := ForStmt.in. context 

Expr.in.env ForStmt.in.env 
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Expr.in.vars := Forinit.out.vars 
ForUpdate. in. context := ForStmt. in. context 
ForUpdate.in.env := ForStmt. in. env 
ForUpdate. in. vars := Forinit.out.vars 
Statement. in.context := ForStmt. in. context 
Statement. in.env := ForStmt. in. env 
Statement. in. vars := Forinit.out.vars 

<ForStmtNoShortIf> 

for ( <ForInit>^ ; <Expr>^ ; <ForUpdate>’ ) <StmtNoShortIf> 
Forinit.in := ForStmt. in 
Expr. in. context := ForStmt. in. context 
Expr.in.env ForStmt. in. env 
Expr.in.vars Forinit.out.vars 
ForUpdate. in. context ForStmt. in. context 

ForUpdate.in.env := ForStmt. in. env 
ForUpdate. in. vars ;= Forinit.out.vars 
StmtNoShortIf.in. context := ForStmt. in. context 
StmtNoShortIf.in.env := ForStmt. in. env 
StmtNoShortIf.in. vars := Forinit.out.vars 

<ForInit> ::= 

< ExprStmtList > 

ExprStmtList.in Forinit.in 
Forinit.in. vars := ExprStmtList. out. vars 

I <LocalVarDecI> 

Local VarDecl. in Forinit.in 

Forinit.out.vars := LocalVarDecl. out. vars 

<ForUpdate> ::= 

< ExprStmtList > 

ExprStmtList.in ForUpdate. in 

< ExprStmtList > 

<ExprStmt> 

ExprStmt.in ExprStmtList.in 
I <ExprStmtListi > , <ExprStmt> 

ExprStmt.in ExprStmtList.in 
ExprStmtListi .in ExprStmtList.in 

<BreakStmt> ::= 
break <Id>’ ; 

Id. in BreakStmt.in 

if not (BreakStmt .in. env. isLabeI(Id.out .value) ) 

ERROR( “Undefined Label ”+Id.out.vaIue+“ in Break statement”) 
endif 

<ContStmt> = 

continue <Id>^ ; 
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Id. in ContStmt.in 

if not (ContStmt. in. env.isLabel(Id.out. value)) 

ERROR( “Undefined Label ”+Id. out. valued- “ in Continue statement”) 
endif 

<RetStmt> ::= 

return <Expr>’ ; 

Expr.in RetStmt.in 

if not (Expr. out. assignableTo(RetStmt. in. context. returnTypeO) 
ERROR(Expr. out. typed- “ not compatible with return type”) 
endif 

<ThrowStmt> ::= 
throw <Expr> ; 

Expr.in RetStmt.in 

if not (RetStmt.in. context, throws (Expr. out. type)) 

ERROR( “Statment does not throw exception: ”d- Expr. out. type) 
endif 

<SynchStmt> ::= 

synchronized ( <Expr> ) <Block> 

Expr.in SynchStmt.in 
Block. in SynchStmt.in 
if not (Expr. out. type, equals(ref)) 

ERROR( “Argument of synchronized statement must be reference type” ) 
endif 

<TryStmt> ::= 

try <Block> <Catches> 

Block. in := TryStmt.in 
Catches. in := TryStmt.in 
I try <Block> <Catches>^ <Finally> 

Block. in := TryStmt.in 
Catches. in := TryStmt.in 
Finally.in := TryStmt.in 

<Catches> ::= 

<CatchClause> 

CatchClause.in := Catches. in 
I <Catchesi> <CatchClause> 

Catchesi.in Catches. in 

CatchClause.in := Catches. in 

<CatchClause> ::= 

catch ( <FormalParam> ) <Block> 

FormalParam.in CatchClause.in 
Block. in CatchClause.in 

if not (FormalParam. out. type. promotableTo(Throwable)) 

ERROR( “Catch clause parameter must be of type throwable.”) 
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endif 

<Finally> ::= 

finally <Block> 

Block. in Finally.in 



2.12 Expressions 

The following grammar presents the syntax for expressions in the Java language. 
For expressions, the pertinent output (synthesized) attributes are types and val- 
ues, the input (inherited) attributes are context, environment and variables. 

The JLS specifies the types of expressions, dependent on the types of the 
subexpressions and the form of the expression. However, in the case where there 
is a compile-time error (e.g., a type mismatch error), the JLS does not specify 
either a default or calculated return type. This enables compiler writers to make 
their own interpretation of the return type, resulting in incompatible behavior 
during compilation when compile-time errors are present. In this specification 
we have chosen return types that either follow the convention of the Sun JDK, 
or result in a relatively intuitive result. For select expressions, the return type 
and value are both undef an undefined value. For type checking methods, an 
undefined type is compatible with all types. For these errors, we have made the 
following decisions: 

— For the conditional expression <CondExpr>, experimentation with the Sun 
JDK indicates that the resulting type is the type of the right most expression 
<CondExpri>. We followed that precedent in this specification. 

— For the overloaded operators |, &, and A, which can be used for either 
boolean operations or for numeric bit-wise operations, we follow the conven- 
tion that the expected and return types are boolean. 

— for shift, addition, subtraction and multiplication operations, the default 
return type is int. 

— for the or, and, and xor operations the default value is boolean (even if the 
programmer intended on a bitwise operation) . 



<ConstExpr> ::= 

<Expr> 

Expr.in ConstExpr.in 
ConstExpr.out := Expr.out 

<Expr> ::= 

<AssignExpr> 

AssignExpr.in := Expr.in 
Expr.out := AssignExpr.out 
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<AssignExpr> ::= 

<Assign> 

Assign. in AssignExpr.in 

AssignExpr.out := Assign. out 
I <CondExpr> 

CondExpr.in AssgnExpr.in 
AssignExpr.ont := CondExpr.ont 

<Assign> ::= 

<LHS> <AssignOp> <AssignExpr> 

LHS.in := Assign. in 
AssignExpr.in := Assign. in 
if (AssignOp.ont.valne == EQ) 

if not (AssignExpr.ont.assignableTo(LHS. out. type)) 

ERROR) “Assignment conversion error, cannot convert” + 
AssignExpr.out. type + “to” + LHS. out. type) 

endif 

else if (AssignOp. out. value == NumEQ) 
if not(LHS. out. type. equals(numeric) and 

AssignExpr.out. type. equals(numeric)) 

ERROR) “Operands of ” +AssignOp+ “ must be numeric” ) 
endif 

else // AssignOp.ont.valne == BitEQ 
if not)LHS. out. type. equals)numeric) and 

AssignExpr.out. type. equals)numeric)) or 
not )LHS. out. type. equals)boolean) and 
AssignExpr.out. type. equals)boolean)) 

ERROR ) “Operands of ”+AssignOp+“ must be 
both either boolean or numeric”) 

endif 

endif 

Assign. out. type LHS. out. type 

Assign. out. value ;= undef 

<LHS> ::= 

<Name> 

Name. in LHS.in 
LHS. out := Name. out 
I <FieldAccess> 

Name. in := Field Access, in 
FieldAccess.out Name. out 

I <ArrayAccess> 

Name. in := Array Access, in 
ArrayAccess.out := Name. out 

<AssignOp> 

AssignOp.ont.valne := EQ 
I * = 

AssignOp.ont.valne := NumEQ 
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/ = 

AssignOp. out. value 
% = 

AssignOp. out. value 
+ = 

AssignOp. out. value 

AssignOp. out. value 
<< = 

AssignOp. out. value 
>>= 

AssignOp. out. value 

AssignOp. out. value 
& = 

AssignOp. out. value 

A _ 

AssignOp. out. value 
1 = 

AssignOp. out. value 



NumEQ 

NumEQ 

NumEQ 

NumEQ 

NumEQ 

NumEQ 

NumEQ 

BitEQ 

BitEQ 

BitEQ 



<CondExpr>::= 

<CondOrExpr> 

CondOrExpr.in CondExpr.in 
CondExpr.out := CondOrExpr.out 
I <CondOrExpr> ? <Expr> : <CondExpri> 

CondOrExpr.in := CondExpr.in 
Expr.in := CondExpr.in 
CondExpri .in := CondExpr.in 

// Check type of conditional expression and evaluate 
if not (CondOrExpr.out. type, equals(boolean)) 

ERROR (“Expression on LHS of ? must be boolean”) 
else if (CondOrExpr.out. value == undef) 

CondOrExpr.out. value := undef 
else 

if (CondOrExpr.out. value == true) 

CondExpr.out. value := Expr. out. value 
else 

CondExpr.out. value := CondExpri .out. value 
endif 
endif 

/ / Handle case if both right-hand subexpressions are boolean 
if (Expr. out. type. equals(boolean) and 

CondExpri .out. type. equals(boolean)) 

CondExpr.out. type := boolean 

/ / Handle case if both right-hand subexpressions are numeric 
else if (Expr. out. type. equals(numeric) and 
CondExpri .out. type. equals(numeric)) 
if (Expr. out. type. equals(CondExpri .out. type)) 
CondExpr.out. type := Expr. out. type 
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else if ((Expr. out. type. equals(byte) and 

CondExpri .out. type. equals(short)) or 
(Expr. out. type. equals(short) and 
CondExpri .out. type. equals(byte)) 

CondExpr. out. type := short 
else if (Expr. out. type. inList([short;char;byte]) and 
CondExpri . out. assignableTo(Expr. out. type)) 

ConExpr. out. type := Expr. out. type 
else if (CondExpri .out. type. inList([short;char;byte]) and 
Expr. out. assignableTo(CondExpri .out. type)) 

CondExpr. out. type CondExpri .out. type 

else 

CondExpr. out. type := 

binaryNumericConversion(Expr. out. type, CondExpri .out. type) 

endif 

/ / Handle case if both right-hand subexpresions are references 
else if (Expr. out. type. equals(ref) and 
CondExpri .out. type. equals(ref)) 
if (Expr. out. type. promotableTo(CondExpri .out. type)) 

CondExpr. out. type := CondExpri .out. type 
else if (CondExpri .out. type. promotableTo(Expr. out. type)) 

CondExpr. out. type := Expr. out. type 
else 

ERROR( “Can’t convert Expr. out. typed- “to ”-|-CondExpri .out. type) 

CondExpr. out. type Expr. out. type 

endif 
else 

ERROR( “Can’t convert ”-|- Expr. out. typed- “to ”-|-CondExpri. out. type) 
CondExpr. out. type := Expr. out. type 
endif 

<CondOrExpr>::= 

< Cond AndExpr > 

CondAndExpr.in := CondOrExpr.in 
CondOrExpr.out := CondAndExpr.out 
I <CondOrExpri> || <CondAndExpr> 

CondOrExpri .in CondOrExpr.in 
CondAndExpr.in := CondOrExpr.in 
if not(CondOrExpri .out. type. equals(boolean) and 
CondAndExpr.out.type.equals(boolean)) 

ERROR) “Both arguments to || must be boolean”) 

CondOrExpr.out. value := undef 
else if not (CondOrExpri. out. value. defined))) 

CondOrExpr.out. value := undef 
else if (CondOrExpri .out. value == true) 

CondOrExpr.out. value := true 
else if not (CondAndExpr.out. value. defined))) 

CondOrExpr.out. value := undef 
else if (CondAndExpr.out. value == true) 

CondOrExpr.out. value := true 




28 



Jim Alves-Foss and Deborah Frincke 



else 

CondOrExpr. out. value := false 
endif 

CondOrExpr. out. type := boolean 

< CondAndExpr> : 

<IncOrExpr> 

IncOrExpr.in := CondAndExpr.in 
CondAndExpr.out := IncOrExpr.out 
I <CondAndExpri > && <IncOrExpr> 

CondAndExpri .in := CondAndExpr.in 
IncOrExpr.in := CondAndExpr.in 
if not(CondAndExpri .out. type. equals(boolean) and 
IncOrExpr.out. type. equals(boolean)) 

ERROR(“Both arguments to && must be boolean”) 

CondAndExpr.out. value ;= undef 
else if not (CondAndExpri .out. value. defined()) 

CondAndExpr.out. value := undef 
else if (CondAndExpri .out. value == false) 

CondAndExpr.out. value := false 
else if not (IncOrExpr.out. value. defined()) 

CondAndExpr.out. value := undef 
else if (IncOrExpr.out. value == true) 

CondAndExpr.out. value := true 
else 

CondAndExpr.out. value := false 
endif 

CondAndExpr.out. type boolean 

<IncOrExpr>::= 

<XORExpr> 

XORExpr.in IncOrExpr.in 
IncOrExpr.out := XORExpr.out 
I <IncOrExpri> | <XORExpr> 

IncOrExpri .in := IncOrExpr.in 
XORExpr.in := IncOrExpr.in 
if (IncOrExpri .out. type. equals(integral) and 
XORExpr.out. type, equals(integral)) 

IncOrExpr.out. type : = 

binaryNumericConversion(IncOrExpri .out. type, XORExpr.out. type) 
IncOrExpr.out. value IncOrExpri .out. value. bitOR(XORExpr. out. value) 

else if not(IncOrExpri .out. type. equals(boolean) and 
XORExpr.out. type, equals(boolean)) 

ERROR) “Both arguments to | must be boolean or numeric”) 
IncOrExpr.out. value := undef 
IncOrExpr.out. type ;= boolean 
else 

IncOrExpr.out. value := IncOrExpri .out. value. OR(XORExpr. out. type) 
IncOrExpr.out. type ;= boolean 
endif 
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<XORExpr>::= 

<AndExpr> 

AndExpr.in XORExpr.in 
XORExpr.out := AndExpr.out 
I <XORExpri> ^ <AndExpr> 

XORExpri.in := XORExpr.in 
AndExpr.in XORExpr.in 
if (XOROrExpri. out. type. equals(integral) and 
AndExpr . out . type, equals (integral) ) 

XORExpr.out. type := 

binaryNumericConversion(XORExpri .out. type, AndExpr. out. type) 
XORExpr.out. value := XORExpri. out. value. bitXOR( AndExpr. out. value) 
else if not(XORExpri .out. type. equals(boolean) and 
AndExpr. out. type, equals(boolean)) 

ERROR(“Both arguments to A must be boolean or numeric”) 
XORExpr.out. value := undef 
XORExpr.out. type := boolean 
else 

XORExpr.out. value := XORExpri. out. value. XOR( AndExpr. out. type) 
XORExpr.out. type := boolean 
endif 

<AndExpr>::= 

<EqualExpr> 

EqualExpr.in := AndExpr.in 
AndExpr.out := EqualExpr.out 
I <AndExpri> & <EqualExpr> 

AndExpri.in AndExpr.in 
EqualExpr.in := AndExpr.in 
if (AndExpri .out. type. equals(integral) and 
EqualExpr. out .type, equals (integral) ) 

AndExpr. out. type := 

binaryNumericConversion( AndExpri .out. type, EqualExpr. out. type) 
AndExpr. out. value := AndExpri .out. value. bitAND(EqualExpr. out. value) 
else if uot(AndExpri .out. type. equals(boolean) and 
EqualExpr. out. type, equals(boolean)) 

ERROR(“Both arguments to &: must be boolean or numeric”) 

AndExpr. out. value undef 

AndExpr. out. type := boolean 
else 

AndExpr. out. value := AndExpri .out. value. AND(EqualExpr. out. type) 
AndExpr. out. type := boolean 
endif 

< EqualExpr > ::= 

<RelatExpr> 

RelatExpr.in EqualExpr.in 
EqualExpr.out := RelatExpr.out 
I <EqualExpri> == <RelatExpr> 
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EqualExpr. out. type := boolean 

if not(EqualsExpri .out. type. compatibleWith(RelatExpr. out. type)) 

ERROR) “Operands of == must be compatible types”) 

EqualExpr. out. value ;= undef 
else 

EqualExpr. out. value := EqualExpri .out. value. EQ(RelatExpr. out. value) 
endif 

I <EqualExpri> != <RelatExpr> 

EqualExpr. out. type boolean 

if not(EqualsExpri .out. type. compatibleWith(RelatExpr. out. type)) 

ERROR) “Operands of != must be compatible types”) 

EqualExpr. out. value := undef 
else 

EqualExpr. out. value not)EqualExpri .out. value. EQ)RelatExpr. out. value)) 
endif 

<RelatExpr>::= 

<ShiftExpr> 

ShiftExpr.in RelatExpr.in 
RelatExpr.out := ShiftExpr.out 

I <RelatExpri> < <ShiftExpr> 

RelatExpri .in := RelatExpr.in 
ShiftExpr.in := RelatExpr.in 
if )TypeCheck)numeric, RelatExpri .out. type) and 
TypeCheck)numeric, ShiftExpr.out. type)) 

RelatExpr.out. type boolean 

RelatExpr.out. value := RelatExpri .out. value. LT)ShiftExpr. out. value) 
else 

ERROR )“Both arguments to < must be numeric type”) 

RelatExpr.out. type := boolean 
RelatExpr.out. value undef 
endif 

I <RelatExpri> > <ShiftExpr> 

RelatExpri .in := RelatExpr.in 
ShiftExpr.in RelatExpr.in 
if )TypeCheck)numeric, RelatExpri .out. type) and 
TypeCheck)numeric, ShiftExpr.out. type)) 

RelatExpr.out. type := boolean 

RelatExpr.out. value := RelatExpri . out. value.GT)ShiftExpr. out. value) 
else 

ERROR )“Both arguments to > must be numeric type”) 

RelatExpr.out. type := boolean 
RelatExpr.out. value undef 
endif 

I <RelatExpri> <= <ShiftExpr> 

RelatExpri .in := RelatExpr.in 
ShiftExpr.in := RelatExpr.in 
if )TypeCheck)numeric, RelatExpri .out. type) and 
TypeCheck)numeric, ShiftExpr.out. type)) 

RelatExpr.out. type := boolean 
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RelatExpr. out. value := RelatExpri .out. value. LE(ShiftExpr. out. value) 
else 

ERROR (“Both arguments to <= must be numeric type”) 
RelatExpr. out. type boolean 
RelatExpr. out. value undef 
endif 

I <RelatExpri> >= <ShiftExpr> 

RelatExpri .in := RelatExpr. in 
ShiftExpr.in := RelatExpr. in 
if (TypeCheck(numeric, RelatExpri .out. type) and 
TypeCheck(numeric, ShiftExpr. out. type)) 

RelatExpr. out. type boolean 

RelatExpr. out. value := RelatExpri .out. value. GE(ShiftExpr. out. value) 
else 

ERROR (“Both arguments to >= must be numeric type”) 
RelatExpr. out. type := boolean 
RelatExpr. out. value undef 
endif 

I <RelatExpri> instanceof <RefType> 

RelatExpri. in := RelatExpr. in 
RefType.in RelatExpr. in 

if (typeGheck(refOrNull, RelatExpri .out. type) and 
typeGheck(ref, ShiftExpr. out. type) and 

RefType. out. type. promotableTo(RelatExpri .out. type) 

RelatExpr. out. type := boolean 
RelatExpr. out. type := undef 
else 

ERROR (“Impossible for ”+RelatExpri.out.type+ 

“ to be instance of ”+RefType. out. type) 

RelatExpr. out. type := boolean 
RelatExpr. out. value undef 
endif 

<ShiftExpr> ::= 

<AddExpr> 

AddExpr.in ShiftExpr.in 
ShiftExpr. out := AddExpr.out 
I <ShiftExpri> << <AddExpr> 

ShiftExpri .in := ShiftExpr.in 
AddExpr.in ShiftExpr.in 

if (TypeGheck(integral, ShiftExpri .out. type) and 
TypeGheck(integral, AddExpr.out. type)) 

Shift. out. type := promote(ShiftExpri .out. type, AddExpr.out. type) 
else ERROR (“Both arguments to << must be integral type”) 
ShiftExpr. out. type := int 

Shift. out. value := Shifti. out. value. LS(AddExpr. out. value) 
endif 

I <ShiftExpri> >> <AddExpr> 

ShiftExpri .in := ShiftExpr.in 
AddExpr.in := ShiftExpr.in 
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if (TypeCheck(integral, ShiftExpri .out. type) and 
TypeCheck(integral, AddExpr. out. type)) 

Shift. out. type := promote(ShiftExpri .out. type, AddExpr. out. type) 

Shift. out. value := Shifti. out. value. RSS(AddExpr. out. value) 
else ERROR (“Both arguments to >> must be integral type”) 

ShiftExpr. out. type := int 
ShiftExpr. out. value := undef 
endif 

I <ShiftExpri> >>> <AddExpr> 

ShiftExpri .in := ShiftExpr. in 
AddExpr. in := ShiftExpr. in 

if (TypeCheck(integral, ShiftExpri .out. type) and 
TypeCheck(integral, AddExpr. out. type)) 

Shift. out. type := promote(ShiftExpri .out. type, AddExpr. out. type) 

Shift. out. value := Shifti. out. value. RSZ(AddExpr. out. value) 

ShiftExpr. out. type ;= int 
ShiftExt. out. value := undef 

else ERROR (“Both arguments to >>> must be integral type”) 

ShiftExpr. out. type := int 
ShiftExt. out. value ;= undef 
endif 

< AddExpr > ::= 

<MultExpr> 

MultExpr.in := AddExpr. in 
AddExpr. out := MultExpr.out 

I <AddExpri> + <MultExpr> 

AddExpri.in := AddExpr. in 
MultExpr.in := AddExpr. in 
if (TypeCheck(string, AddExpri .out. type) or 
TypeCheck(string, MultExpr.out .type) ) 

AddExpr. out. type := string 

AddExpr. out. value := AddExpri .out. value. string+( MultExpr.out. value) 
else if (TypeCheck(numeric, AddExpri .out. type) and 
TypeCheck(numeric, MultExpr.out. type)) 

AddExpr. out. type promote(AddExpri .out. type, MultExpr.out. type) 

AddExpr. out. value := AddExpri .out. value + MultExpr.out. value 
else ERROR (“Both arguments to + must be numeric, or one a string”) 
AddExpr. out. type := int 
AddExpr. out. value := undef 
endif 

I <AddExpr> - <MultExpr> 

AddExpri .in := AddExpr. in 
MultExpr.in := AddExpr. in 
if (TypeCheck(numeric, AddExpri .out. type) and 
TypeCheck(numeric, MultExpr.out. type)) 

AddExpr. out. type promote) AddExpri .out. type, MultExpr.out. type) 

AddExpr. out. value := AddExpri .out. value - MultExpr.out. value 
else ERROR (“Both arguments to + must be NumType”) 

AddExpr. out. type := int 
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AddExpr. out. value := undef 
endif 

<MultExpr> ::= 

< Unary Expr> 

UnaryExpr.in := MultExpr.in 
MultExpr.out := UnaryExpr.out 
I <MultExpri> * <UnaryExpr> 

MultExpri.in := MultExpr.in 
UnaryExpr.in := MultExpr.in 
if (typeCheck(numeric, MultExpri .out. type) and 
typeCheck(numeric, UnaryExpr.out. type)) 

MultExpr.out. type := promote(MultExpri .out. type, UnaryExpr.out. type) 
AddExpr. out. value := MultExpri .out. value - AddExpr. out. value 
else ERROR (“Both arguments to * must be numeric”) 

MultExpr.out. type := int 
MultExpr.out. value undef 

endif 

I <MultExpri> / <UnaryExpr> 

MultExpri.in := MultExpr.in 
UnaryExpr.in := MultExpr.in 
if (typeCheck(numeric, MultExpri .out. type) and 
typeCheck(numeric, UnaryExpr.out. type)) 

MultExpr.out. type := promote(MultExpri .out. type, UnaryExpr.out. type) 
MultExpr.out. value := MultExpri .out. value - AddExpr. out. value 
else ERROR (“Both arguments to / must be numeric”) 

MultExpr.out. type := int 
MultExpr.out. value undef 

endif 

I <MultExpri> % <UnaryExpr> 

MultExpri.in := MultExpr.in 
UnaryExpr.in := MultExpr.in 
if (typeCheck(numeric, MultExpri .out. type) and 
typeCheck(numeric, UnaryExpr.out. type)) 

MultExpr.out. type := promote(MultExpri .out. type, UnaryExpr.out. type) 
MultExpr.out. value := MultExpri .out. value % AddExpr. out. value 
else ERROR (“Both arguments to % must be numeric”) 

MultExpr.out. type := int 
MultExpr.out. value undef 

endif 

<UnaryExprNotPlusMinus> ::= 

<CastExpr> 

CastExpr.in := UnaryExprNotPlusMinus.in 
UnaryExprNotPlusMinus.out := CastExpr.out 
I <PostExpr> 

PostExpr.in := UnaryExprNotPlusMinus.in 
UnaryExprNotPlusMinus.out := PostExpr.out 
I ~ <UnaryExpr> 

UnaryExpr.in := UnaryExprNotPlusMinus.in 
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if not (Unary Expr . out .type, equals (integral) ) 

ERROR(“ Argument of ~ must be primitive Integral Type”) 
UnaryExprNotPlusMinus. out. type := int 
UnaryExprNotPlusMinus. out. value ;= undef 
else 

UnaryExprNotPlusMinus. out. type := Unary Expr. out. type 
UnaryExprNotPlusMinus. out. value := UnaryExpr. out. value, bit NOT () 
endif 

I ! < Unary Expr > 

if not (Unary Expr. out. type, equals(integral)) 

ERROR(“ Argument of ! must be boolean”) 
UnaryExprNotPlusMinus. out. type := boolean 
UnaryExprNotPlusMinus. out. value ;= undef 
else 

UnaryExprNotPlusMinus. out. type := Unary Expr. out. type 
UnaryExprNotPlusMinus. out. value := UnaryExpr. out. value. NOT () 
endif 

<CastExpr>::= 

******* Needs to check types for validity of cast *********** 

( <PrimType> <Dims>’ ) <UnaryExpr> 

PrimType.in CastExpr.in 
Dims. in := CastExpr.in 
UnaryExpr. in := CastExpr.in 

CastExpr. out. type := array(PrimType. out. type, Dims. out. type) 

I ( <Expr> ) <UnaryExprNotPlusMinus> 

Expr. in CastExpr.in 

UnaryExprNotPlusMinus. in CastExpr.in 

Cast Expr. out. type := Expr. out. type 
I ( <Name Dims> ) <UnaryExprNotPlusMinus> 

Name. in CastExpr.in 

Dims. in := CastExpr.in 

UnaryExprNotPlusMinus. in CastExpr.in 

Cast Expr. out. type := array(Name. out. type, Dims. out. type) 

< Post Expr > ::= 

<Primary> 

Primary.in := PostExpr.in 
PostExpr.out := Primary.out 
I <Name> 

Name. in PostExpr.in 

if (Name.out.isExpressionO) 

PostExpr.out. type := Name. out. getType() 

PostExpr.out. value := Name. out. get Value () 
else 

ERROR) “Undefined variable ”+ Name. out. value) 

Post Expr. out. type := undef 
PostExpr.out. value ;= undef 
endif 




Formal Grammar for Java 



35 



I <PostIncExpr> 

PostIncExpr.in PostExpr.in 
PostExpr.out := PostIncExpr.out 
I <PostDecExpr> 

PostDecExpr.in PostExpr.in 
PostExpr.out := PostDecExpr.out 

<PostIncExpr> ::= 

<PostExpr> ++ 

PostExpr.in PostIncExpr.in 

if not (PostExpr. out. type. equals(numeric)) 

ERROR( “Postfix Expr must be a variable of numeric type” ) 
PostIncExpr.out. type := int 
PostIncExpr.out. value := undef 
else 

PostIncExpr.out. type := PostExpr. out. type 
if (Post Expr. out. value. definedO) 

PostIncExpr.out. value := PostExpr. out. value + 1 
else 

PostIncExpr.out. value := undef 
endif 
endif 

<PostDecExpr> ::= 

<PostExpr> - - 
PostExpr.in PostDecExpr.in 
if not (PostExpr. out. type, equals(numeric)) 

ERROR( “Postfix Expr must be a variable of numeric type” ) 
PostDecExpr.out. type := int 
PostDecExpr.out. value ;= undef 
else 

PostDecExpr.out. type := PostExpr. out. type 
if (Post Expr. out. value. definedO) 

PostDecExpr.out. value := Post Expr. out. value - 1 
else 

PostDecExpr.out. value ;= undef 
endif 
endif 

<UnaryExpr>::= 

<PreIncExpr> 

PreIncExpr.in := Unary Expr. in 
UnaryExpr.out PreIncExpr.out 
I <PreDecExpr> 

PreDecExpr.in := UnaryExpr.in 
UnaryExpr.out PreDecExpr.out 
I + < Unary Expri> 

Unary Expri .in := UnaryExpr.in 

if not (Unary Expri .out. type. equals(numeric)) 

ERROR( “Argument of unary + must be numeric’) 
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UnaryExpr. out. type int 

UnaryExpr. out. value ;= undef 
else 

UnaryExpr. out := UnaryExpri .out 
endif 

I - < UnaryExpri > 

UnaryExpri .in := UnaryExpr. in 

if not (UnaryExpri .out. type. equals(numeric)) 

ERROR( “Argument of unary - must be numeric’) 

UnaryExpr. out. type int 

UnaryExpr. out. value ;= undef 
else 

UnaryExpr. out. type := UnaryExpri .out. type 
if (UnaryExpri .out. value. defined()) 

UnaryExpr. out. value 0 - UnaryExpri .out. value 

else 

UnaryExpr. out. value ;= undef 
endif 
endif 

I <UnaryExprNotPlusMinus> 

UnaryExprNotPlusMinus.in := UnaryExpr. in 
UnaryExpr. in := UnaryExprNotPlusMinus.in 

<PreIncExpr> ::= 

H — h < Unary Expr> 

UnaryExpr. in := PreIncExpr.in 

PreIncExpr. out. value := undef 

if not (UnaryExpr. out. type, equals(numeric)) 

ERROR( “Preincrement expr must be a variable of numeric type” ) 
PreIncExpr. out. type := int 
else 

PreIncExpr. out. type := UnaryExpr. out. type 
endif 
endif 

<PreDecExpr> ::= 

— < Unary Expr > 

UnaryExpr. in := PreDecExpr.in 

PreDecExpr. out. value undef 

if not (Unary Expr. out. type, equals(numeric)) 

ERROR( “Predecrement expr must be a variable of numeric type”) 
PreDecExpr. out. type int 

else 

PreDecExpr. out. type := UnaryExpr. out. type 
endif 
endif 

<Primary> ::= 

<PrimaryNoNewArray> 

PrimaryNoNewArray.in Primary.in 
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Primary.out ;= PrimaryNoNewArray.out 
I <ArrayCreationExpr> 

ArrayCreationExpression := Primary.in 
Primary.out ArrayCreationExpression. out 

<PrimaryNoNewArray> ::= 

< Literal > 

Literal. in := PrimaryNoNewArray.in 
PrimaryNoNewArray.out := Literal. out 

I this 

if not(PrimaryNoNewArray.context.isInstanceMethod() or 
PrimaryN oN ew Array, context . isConst ructor ( ) ) 

ERROR(“this permitted only in an instance method or constructor”) 
endif 

PrimaryNoNewArray.out. value undef 

PrimaryNoNewArray.out. type = Primary. NoNewArray. in. context. getClass() 
I ( <Expr> ) 

Expr.in PrimaryNoNewArray.in 
PrimaryNoNewArray.out := Expr.out 
I <ClassInstCreationExpr> 

ClassInstCreationExpr.in := PrimaryNoNewArray.in 
PrimaryNoNewArray.out := ClassInstCreationExpr.out 
I < Field AcO 

FieldAcc.in PrimaryNoNewArray.in 
PrimaryNoNewArray.out := FieldAcc.out 
I <MethodInv> 

Methodinv.in := PrimaryNoNewArray.in 
PrimaryNoNewArray.out := Methodinv.out 
I <ArrayAccess> 

ArrayAccess.in := PrimaryNoNewArray.in 
PrimaryNoNewArray.out := ArrayAccess.out 

<ArrayCreationExpr> ::= 

new <PrimType> <DimExprList> <Dims>' 

PrimType.in := ArrayCreationExpr.in 
DimExprList.in := ArrayCreationExpr.in 
Dims. in ArrayCreationExpr.in 

ArrayCreationExpr. out. value := undef 
ArrayCreationExpr. out. type := 

PrimType. out. type. inc(DimExprList. out. value + Dims. out. value) 

I new <ClassInterfaceType> <DimExprList> <Dims>’ 

ClassInterfaceType.in := ArrayCreationExpr.in 
DimExprList.in := ArrayCreationExpr.in 
Dims. in ArrayCreationExpr.in 

ArrayCreationExpr. out. value := undef 
ArrayCreationExpr. out. type := 

PrimType. out. type. inc(DimExprList. out. value + Dims. out. value) 

< ClassInstCreationExpr> : := 
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new <ClassType> ( <ArgList>^ ) <ClassBody>’ 

ClassType.in := ClassInstCreationExpr.in 
ArgList.in ClassInstCreationExpr.in 
ClassBody.in ClassInstCreationExpr.in 
ClassInstCreationExpr. out. type := ClassType. out. type 
ClassInstCreationExpr. out. value ;= undef 
********* Finish this to check argument types for constructor *********** 

I new <InterfaceType> () <ClassBody> 

InterfaceType.in ClassInstCreationExpr.in 
ArgList.in := ClassInstCreationExpr.in 
ClassBody.in := ClassInstCreationExpr.in 
ClassInstCreationExpr. out. type := InterfaceType. out. type 
ClassInstCreationExpr. out. value ;= undef 
********* Finish this to check types *********** 

<FieldAcc> ::= 

<Primary> . <Id> 

Primary.in := FieldAcc.in 
Id. in := FieldAcc.in 

if not (FieldAcc.in.env.typeCheck(ref,Primary.out.type)) 

ERROR(Primary. out. type + “must be a reference type”) 

FieldAcc. out. type := null 

else if not (FieldAcc. in.env.idCheck(Primary.out. type, Id. out. type))) 
ERROR(Id .out. value + “must be non-ambiguous and accessible”) 
FieldAcc. out. type null 

else 

FieldAcc. out. type := 

FieldAcc. in. env.lookupFieldType(Primary. out. type, Id. out. type) 
FieldAcc. out. value := 

Field. in. env.lookupFieldValue(Primary. out. type. Id. out. type) 

endif 

I super . <Id> 

Id. in FieldAcc.in 

FieldAcc. out := Id. out 

if FieldAcc. context. className() == “Java.lang. Object” 

Error) “Term super not permitted in class Object”) 
else if not(PrimaryNoNewArray.context.isInstanceMethod() or 
PrimaryN oN ew Array, context . isConst ructor ( ) ) 

ERROR) “super permitted only in an instance method or constructor”) 
else 

FieldAcc. out. type := FieldAcc. in. env.lookupFieldType 
(FieldAcc. in. context. getSuper)), Id. out. type) 

FieldAcc. out. value := FieldAcc. in. env.lookupFieldValue 
(FieldAcc. in. context. getSuper)), Id. out. type) 

endif 

<MethodInv> ::= 

<Name> ( <ArgList>^ ) 

*** 15.11.1 Type Name ID not interface 
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I <Primary> . <Id> ( <ArgList>’ ) 

*** Id must be non-ambiguous and accessible 
I super . <Id> ( <ArgList>’ ) 

if FieldAcc. context. className() == “Java.lang. Object” 

Error(“Term super not permitted in class Object”) 

FieldAcc. out := FieldAcc. in 

else if not(PrimaryNoNewArray.context.isInstanceMethod() or 
PrimaryN oN ew Array, context . isConstruct or ( ) ) 

ERROR) “super permitted only in an instance method or constructor”) 
else 

FieldAcc. out. type = FieldAcc. out. context. getSuper)) + Id. out. type 
endif 

<ArrayAccess> ::= 

<Name> [ <Expr> ] 

Name. in := Array Access, in 
Expr.in ;= Array Access. in 
ArrayAccess. out. value ;= undef 
if not (Expr.out.type.promotableTo(int)) 

ERROR“Array indicies must be integers”) 
endif 

if not(typeCheck(array, ArrayAccess.env.lookupType(Name. out. type))) 
ERROR(Name.value+“must be of array type”) 

ArrayAccess. out. type ;= undef 
else 

Array Access. out. type = 

unmkArrayType(ArrayAccess.env.lookupType(Name. out. type)) 

endif 

I <PrimaryNoNewArray> [ <Expr> ] 

PrimaryNoNewArray.in ArrayAccess. in 

Expr.in ArrayAccess. in 

ArrayAccess. out. value ;= undef 
if not (Expr.out.type.promotableTo(int)) 

ERROR) “Array indicies must be integers”) 
endif 

if not)typeCheck)array, PrimaryNoNewArray.type))) 
ERROR)Name.value+“must be of array type”) 

ArrayAccess. out. type := undef 
else 

Array Access. out. type = 

unmkArrayType)PrimaryNoNew Array, out. type) 

endif 

<ArgList> ::= 

<Expr> 

Expr.in ArgList.in 
ArgList.out := Expr.out 
I <ArgListi> , <Expr> 

ArgListi.in ArgList.in 
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Expr.in ArgListi.out 
ArgList.out := Expr.out 

<DimExprList> ::= 

<DimExpr> 

DimExpr.in DimExprList.in 
DimExprList.out := DimExpr.out 
I <DimExprListi> <DimExpr> 

DimExprListi .in DimExprList.in 
DimExpr.in DimExprList.in 
DimExprList.out. type := undef 

DimExprList.out. value ;= DimExprListi .out. value + 1 

<DimExpr> ::= 

[ <Expr> ] 

Expr.in DimExpr.in 

if not (typeCheck(integral, Expr.out. type)) 

ERROR (“Dimension declaration must be IntType”) 
endif 

DimExpr.out := Expr.out 
DimExpr.out. type := undef 
DimExpr.out. value := 1 

<Dims> ::= 

[] 

Dims. out := Dims. in 
Dims. out. type := undef 
Dims. out. value := 1 
I <Dimsi> [ ] 

Dimsi.in Dims. in 

Dims. out. value := Dimsi. out. value + 1 

Dims. out. type := undef 
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1 Introduction 

Java combines the experience from the development of several object oriented 
languages, such as C++, Smalltalk and Clos. The philosophy of the language 
designers was to include only features with already known semantics, and to 
provide a small and simple language. 

Nevertheless, we feel that the introduction of some new features in Java, as 
well as the specific combination of features, justifies a study of the Java formal 
semantics. The use of interfaces, reminiscent of is a simplification of the 

signatures extension for C++ B and is - to the best of our knowledge - novel. 
The mechanism for dynamic method binding is that of C++, but we know of 
no formal definition. Java adopts the Smalltalk approach whereby all object 
variables are implicitly pointers. 

Furthermore, although there are a large number of studies of the seman- 
tics of isolated programming language features or of minimal programming lan- 
guages Q, there have not been many studies of the formal semantics of 

actual programming languages. In addition, the interplay of features which are 
very well understood in isolation, might introduce unexpected effects. 

Experience confirms the importance of formal studies of type systems early 
on during language development. Eiffel, a language first introduced in 1985, was 
discovered to have a loophole in its type system in 1990 QQ. Given the growing 
usage of Java, it seems important that if there are loopholes in the type system 
they be discovered early on. 

We argue that the type system of Java is sound, in the sense that unless an 
exception is raised, the evaluation of any expression will produce a value of a 
type “compatible” with the type assigned to it by the type system. 

We were initially attracted to Java because of its elegant combination of 
several tried language features. For this work we were guided by the language 
description in^^J. Any question relating to semantics could be answered unam- 
biguously by ^ 3 . However, we discovered some rules to be more restrictive than 
necessary, and the reasons for some design decisions were not obvious. We hope 
that the language authors will publish a language design rationale soon. 
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1.1 The Java Subset Considered so Far 



In this paper we consider the following parts of the Java language: primitive 
types, classes and inheritance, instance variables and instance methods, inter- 
faces, shadowing of instance variables, dynamic method binding, object creation 
with new, the null value, arrays, exceptions and exception handling. 

We chose this Java subset because we consider the Java way of combining 
classes, interfaces and dynamic method binding to be both novel and interesting. 
Furthermore, we chose an imperative subset right from the start, because the 
extension of type systems to the imperative case has sometimes uncovered new 
problems, {e.g. multi-methods for functional languages B, and for imperative 
languages in Q, the Damas and Milner polymorphic type systems for functional 
languages for the imperative extension ^9). We considered arrays, 

because of the known requirement for run time type checking. 

In contrast with our previous work we follow the language descrip- 

tion in Q rather than the more general approach outlined in older versions of 
the language description. 



1.2 Our Approach 

We define Javas, a provably safe subset of Java containing the features listed pre- 
viously, a term rewrite system to describe the operational semantics and a type 
inference system to describe compile-time type checking. We prove that program 
execution preserves the types up to the subclass/subinterface relationship. 

Java D Javas Javase C Java,- '^p Java,. 

J, J, J, J, 

Type = Type = Type >wdn Type 

We aimed to keep the description straightforward, and so we have removed 
some of the syntactic sugar in Java, e.g. we require instance variable access to 
have the form this . var as opposed to var, and we require the last statement in a 
method to be a return statement. These restrictions simplify the type inference 
and term rewriting systems, but do not diminish the applicability to Java itself. 
It only takes a simple tranformation to turn a Java program from the domain 
under cosideration to the corresponding Javas program. 

The type system is described in terms of an inference system. In contrast with 
many type systems for object oriented languages, it does not have a subsumption 
rule, a crucial property when type checking message expressions, c.f. section^3 
Contrary to Java, Javas statements have a type - and thus we can type check 
the return values of method bodies. 

The execution of Java programs requires some type information at run-time 
{e.g. method descriptors as in chapter 15.11 in |3). For this reason, we define 
Javase, an enriched version of Javas containing compile-time type information 
to be used for method call and field access. 
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During execution, these terms may be rewritten to terms which are not de- 
scribed by the enriched language, Javase- We therefore extend the langauge, 
obtaining Java^, that describes run-time terms. In previous work we 

did not distinguish between Javase and Java^; instead, we only conisdered one 
enriched and extended language. However, as Don Syme pointed out early on, 
the two different reasons for language modifications should naturally lead to 
distinct languages. Also, such a distinction allows a clearer description of the 
concepts. Last not least, this distinction is necessary for the formalization of the 
notions around binary compatibility 

The operational semantics is defined as a ternary rewrite relationship be- 
tween configurations, programs and configurations. Configurations are tuples of 
Javar terms and states. The terms represent the part of the original program 
remaining to be executed. We describe method calls through textual substitu- 
tion. 

We have been able to avoid additional structures such as program counters 
and higher order functions. The Javas simplifications of eliminating block struc- 
ture and local variables allow the definition of the state as a fiat structure, where 
addresses are mapped to objects and global variables are mapped to primitive 
values or addresses. Objects carry their classes (similar to the Smalltalk abstract 
machine Q’ we do not need store types or location typings E3)’ Ob- 
jects are labelled tuples, where each label contains the class in which it was 
declared. Array values are tuples too, and they are annotated by their type and 
their dimension. 

There are strong links between our work and that described in the next two 
chapters of that book Don Syme describes in chapter 4 the formalization 

of a large part of this work using his theorem checker. Declare. During this 
process he uncovered a major flaw in our work, which will be described later on. 
A close collaboration ensued. 

David von Oheimb and Tobias Nipkow have encoded their formalization of 
an enriched language similar to Javase into the theorem prover Isabelle. Thus the 
treatment of the original language, Javas, is omitted. Their description of most 
language constructs is similar to ours, except for exceptions, for which they 
use a dedicated component of the run-time configuration. More importantly, 
they used a large-step operational semantics. This turned out to have incisive 
influence on the necessary proofs, and to allow for spectacular simplifications. 
Thus, in the large step semantics inconsistent intermediate states need not be 
considered and most lemmas could be significantly simplified. This difference 
came as a surprise to all authors. On the other hand, strictly speaking, large 
step semantics cannot make any promise about non terminating programs not 
breaking the type system, nor is it yet clear how large step semantics could 
adequately describe coroutines. 

The rest of this chapter is organized as follows: In sectionHwe give an exam- 
ple in Java, which we use to illustrate the concepts introduced in the subsequent 
sections. In section Jwe give the syntax of Javas . In section^we define the lan- 
guage Javase- In sectionHwe define the static types for Javas, and the mapping 
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from JavUs to Javase- In section^we describe the types of Javase terms, whereas 
in section^we describe the types of Java^ terms. In sectionjwe describe states, 
configurations and the operational semantics for Javar- In section we state 
properties of the operational semantics and in particular the Subject Reduction 
Theorem. In section^] we draw some conclusions. 

2 An Example in Java 

The following, admittedly contrived, Java program serves to demonstrate some 
of the Java features that we tackle, and will be used in later sections to illustrate 
our approach. It can have the following interpretation: Philosophers like truths. 
When a philosopher thinks about a problem together with another philosopher, 
then, after some deliberation, they refer the problem to a third philosopher. 
When a philosopher thinks together with a French philosopher, they produce a 
book. French philosophers like food; they too may think together with another 
philosopher, and finally refer the question to another philosopher. 

Assuming previous definitions of classes Book, Food and Truth, consider the 
classes Phil, FrPhil defined as: 

class Phil { 

Truth like ; 

Phil think(Phil y){ ...} 

Book think(FrPhil y){ ...} 

} 

class FrPhil extends Phil { 

Food like ; 

Phil think(Phil y){like=oyster ; ...} 

} 

Consider the following declarations and expressions: 

Phil aPhil ; FrPhil pascal = new FrPhil ; 

...aPhil . like 

...aPhil . think (pascal) ...aPhil .think (aPhil) 

...pascal . like 

...pascal . think(pascal) ...pascal . think(aPhil) 

The above example demonstrates: 

— Recursive scopes, e.g. the class FrPhil is visible inside the class Phil, that 
is before its declaration. 

— Shadowing of instance variables by static types, e.g. pascal. like is an ob- 
ject of class Food, whereas aPhil. like indicates an object of class Truth, 
even after the assignment aPhil : =pascal. 

— Method binding according to the dynamic class of the receiver, and the static 
class of the arguments: The call aPhil . think (pascal) will result in calling 
the method Phil: : think (FrPhil) {i.e. the think method declared in class 
Phil and which takes a FrPhil argument, and returns a Book), even if aPhil 
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contains a pointer to a FrPhil object. The call aPhil . think(aPhil) will 
result in calling the method Phil: : think (Phil) if aPhil is an object of 
class Phil, and it will result in calling FrPhil: : think (Phil), if aPhil is 
an object of class FrPhil. The call pascal, think (pascal) is ambiguous, 
because the methods Phil: : think (FrPhil) and FrPhil: : think (Phil) are 
applicable, and neither is more special than the other. 

3 The Language Java^ 

Javas is a subset of Java, which includes classes, instance variables, instance 
methods, inheritance of instance methods and variables, shadowing of instance 
variables, interfaces, widening, method calls, assignments, object creation and 
access, the nullvalue, instance variable access and the exception NullPE, arrays, 
array creation and the exceptions ArrStoreE, NegSzeE and IndOutBndE. The 
features we have not yet considered include initializers, constructors, finalizers, 
class variables and class methods, local variables, class modifiers, final/ abstract 
classes and methods, super, strings, numeric promotions and widenings, con- 
currency, packages and separate compilation. 

There are slight differences between the syntax of Javas and Java which were 
introduced to simplify the formal description. A Java program contains both type 
(be. variable declarations, parameter and result types for methods, interfaces 
of classes) and evaluation information {i.e. statements in method bodies). In 
Javas this information is split into two: type information is contained in the 
environment (usually represented by a P), whereas evaluation information is in 
the program (usually represented by a p) . 

We follow the convention that Javas keywords appear as keyword, identifiers 
as identifier, nonterminals appear in italics as Nonterminal, and the meta- 
language symbols appear in Roman {e.g. ::=, ( ,*, )). Identifiers with the suffix 
Id {e.g. Varld) indicate the identifiers of newly declared entities, whereas iden- 
tifiers with the suffix Name {e.g. VarName) are entities that have been previously 
declared. 



3.1 Javas Programs 

A program, as described in figure^ consists of a sequence of class bodies. Class 
bodies consist of a sequence of method bodies. Method bodies consist of the 
method identifier, the names and types of the arguments, and a statement se- 
quence. We require that there is exactly one return statement in each method 
body, and that it is the last statement. This simplifies the Javas operational se- 
mantics without restricting the expressiveness, since it requires at most a minor 
transformation to enable any Java method body to satisfy this property. 

We only consider conditional statements, assignments, method calls, try and 
throw statements. This is because loop, break, continue and case statements 
can be coded in terms of conditionals and recursion. 



46 



Sophia Drossopoulou and Susan Eisenbach 



Program : 


:= ( ClassBody )* 


ClassBody : 


Classid ext ClassName {( MethBody )*} 


MethBody : 


:= Methid is {\VarlA\ Var Type.)* {Stmts', return [Fapr]} 


Stmts : 


:= e 1 Stmts ; Stmt 


Stmt : 

1 
1 
1 


if Expr then Stmts else Stmts 
Var := Expr \ Fxpr.MethNaine(i?a;pr*) | throw Expr 
try Stmts (catch ClassName Id Stmts)* finally Stmts 
try Stmts (catch ClassName Id Stmts)'^ 


Expr : 

1 


'.= Value 1 Var \ Frpr.MethName ( Expr*) \ new ClassName 
ne-w SimpleType ( [ Expr ] )+( [ ] )* 


Var : 


Name | Far. VarName | Var\_Expr'\ \ this 


Value : 


PrimValue | null 


PrimValue : 


:= intValue \ charValue \ byteValue \ ... 


VarType : 


SimpleType \ ArrayType 


SimpleType : 


'.= PrimType \ ClassName | Interf aceName 


Array Type : 


:= SimpleTypel ] | ArrayTypel ] 


PrimType : 


bool 1 char | int 



Fig. 1. JavUs programs 



We consider values, method calls, and instance variable access. Java values 
are primitive {e.g. literals such as true, false, 3, 'c' etc), references or arrays. 
References are null, or pointers to objects. The expression new C creates a new 
object of class C, whereas the expression new T[ei]...[en][]i---Dfc, n > > 

0 creates a n+k-dimensional array value. Pointers to objects are implicit. We 
distinguish variable types (sets of possible run-time values for variables) and 
method types, as can be seen in figure H 

Javas programs contain the class hierarchy. Thus, from a program p we can 
deduce the C relationship, which is the transitive closure of the immediate su- 
perclass relation, and also applies to arrays whose component types are sub- 
classes of each other. This relation is defined in figure H We use the notation 
p = p',C ext C'{...},p" to indicate that p contains a declaration of class C as 
a subclass of C'. The assertion p h C C C' indicates that given program p, C is a 
subclass of C'. 



p = p', C ext C'{...},p” 


p h C c c' 




p h C C C 


p h C' C c" 


p h C c c' 


p h C C C' 


p h C C C" 


phclJ CC'U 


p h nil C C 







Fig. 2. Subclasses deduced from programs p 
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Given a program p we define the functions p(C), which looks up a class body 
with identifier C in p, and Classes (p), which is the set of the identifiers of all 
classes defined in p. 

Definition 1 For a program p, we define p(C), and Classes{p) as follows: 

— p(C) = cBody ijf p = p', cBody, p", and cBody = C ext C' impl 

— p(C) = Undef , otherwise. 

— C G Classes{r) ijf p h C C C. 

3.2 Environments 

The environment, described in figure^ usually denoted by a F, contains both 
the subclass and interface hierarchies and variable type declarations. It also 
contains the type definitions of all variables and methods of a class and its inter- 
face. StandardEnv should include all the predefined classes, and all the classes 
described in chapters 20-22 of ^ 3 , e.g. the exception classes Exception, NullPE, 
ArrStoreE, IndOutBndE, NegSzeE and others - we do not need to distinguish 
between checked and unchecked exceptions. Declarations may be class declara- 
tions, interface declarations or identifier declarations. 



Env : 


:= StandardEnv \ Env ; Deal 


StandardEnv : 


“Exception ext Object ...NullPE ext Exception...; ... 


Deal : 


“ Classid ext ClassName impl (InterfName)* 
{(Varld -.VarType)* (Methid : MethType)*} 

InterfId ext InterfName*{(MethId : MethType)*} 
Varld : VarType 


MethType : 


ArgType {VarType \ void) 


ArgType : 


:= [VarType {x VarType)*] 


VarType : 


-.= SimpleType \ ArrayType 


SimpleType : 


:= PrimType \ ClassName | Interf aceName 


Array Type : 


:= SimpleTypel ] | ArrayTypel ] 


PrimType : 


“ bool 1 char j int | ... 


Type : 


:= VarType \ void | nil | MethType \ ClassName-Thrn 



Fig. 3. Javas environments 



A class declaration introduces a new class as a subclass of another class 
(if no explicit superclass is given, then Object will be assumed), a sequence 
of component declarations, and optionally, interfaces implemented by the class. 
Component declarations consist of field identifiers and their types, and method 
identifiers and their signatures. Since method bodies are not declarations, they 
are found in the program part rather than the environment. 

An interface declaration introduces a new interface as a subinterface of several 
other interfaces and a sequence of components. The only interface components 



48 



Sophia Drossopoulou and Susan Eisenbach 



in JavUs are methods, because interface variables are implicitly static, and have 
not been considered. Variable declarations introduce variables of a given type. 

3.3 The Example in Javag 

The Java philosophers classes from section^correspond to the Javas program pgi 



Ps 



Phil ext Object { 

think is Ay:Phil . {...} 
think is Ay:FrPhil . {...} 

} 

FrPhil ext Phil { 
think is Ay:Phil 

.{this. like : =oyster;...} 



} 



The corresponding Javag environment Fq is: 

Fq = Phil ext Object { like : Truth, 

think : Phil^Phil, 
think : FrPhil^Book,}, 

FrPhil ext Phil { like : Food, 

think : Phil^Phil}, 
aPhil : Phil, pascal : FrPhil 

3.4 Subclasses, Subinterfaces, Widening 

The subclass V and the implements '.imp relations deduced from an environment 

F are defined by the inference rules in figure | 




By the assertion F = F', def, F” we indicate that F contains the definition 
def. Every class introduced in F is its own subclass, and the assertion T h C C C 
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indicates that C is defined in the environment F as a, class. The direct super- 
class of a class is indicated in its declaration. Object is a predefined class. The 
assertion T h C I indicates that the class C was declared in F as providing 
an implementation for interface I. The subclass relationship is transitive. Every 
interface is its own subinterface and the assertion T h I < I indicates that I is 
defined in the environment F as an interface. The superinterface of an interface 
is indicated in its declaration. The subinterface relationship is transitive. 



Th CCC 


T 1- I < I 


r C ^VarType 


r 1. ^VarType 


r \~ T ^VarType 


\~ int ^VarType 


r l~ T[] ‘O^VarType 


\~ chcLlT ^VarType 
\~ booX ^VarType 


r \~ T ^VarType 


or J — void 


r \~ Ti ^VarType 


i € {l...n}, n > 0 


r h Ti X ... X Tn 


^ ArgType 


r 1- Ti X ... X Tn 


^ T ^ MethType 



Fig. 5. Variable and method types 



Variable types, i.e. primitive types, interfaces, classes and arrays, are de- 
fined in figure 5 and are required in type declarations. Method types, i.e. n 
argument types, with n>0, and a result type, are defined in figure ^ and are 
required in method declarations. The assertion T h T C'varType means that T 
is a variable type, T h AT ^ArgType means that AT is a method argument type, 
and T h MT OuethType means that MT is a method type. Note that we do not 
keep track of potentially throwable exceptions in the method type. However, in 
future work method types should be extended to do so, and a stronger subject 
reduction theorem should be proven, stating that a checked exception can only 
be thrown during execution of a method that mentions this exception’s class (or 
superclass) in the method’s type. 

The widening relationship, described in figure Q exists between variable 
types. If a type T can be widened to a type T' (expressed as T h T <uidn T'), 
then a value of type T can be assigned to a variable of type T' without any 
run-time casting or checking taking place. This is defined in chapter 5.1.4 Q; 
chapter 5.1.2 in defines widening of primitive types, but here we shall only 
be concerned with widening of references. Furthermore, for the null value, we 
introduce the type nil which can be widened to any array, class or interface. 
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r \~ T ^VarType 


F h T <wdn Object 


F h T <„dn T 


r \~ nil '^wdn 1 


r 1- T < T 


F 1- T F t' 


F h T <wdn Object 


F 1- T T' 


F h T < T' 


F h T T' 


F h T <„dn T' 


FhT[] <„dn T'Q 


F h T F T' 

n |_ 't' . . j" 

’imp 

^ 1^// 1^/// 

F h T <„dn T'" 


i”" l~ T ^VarType 
F h T[] <wdn Object 



Fig. 6. The widening relationship 



3.5 Well-Formed Declarations and Environments 

The relations C, '-imp, < and <wdn are computable for any environment - as can 
be straightforwardly shown. In figure J we describe the Java requirements for 
variable, class and interface declarations to be well-formed. 

We indicate by F h F' O, that the declarations in environment F' are well- 
formed, under the declarations of the larger environment F. We need to consider 
a larger environment F because Java allows forward declarations {e.g. in the 
philosophers example, class Phil uses the class FrPhil whose declaration follows 
that of Phil). We shall call F well-formed, iff F h F O, in which case we use 
the shorthand F h O, c.f. the third rule in figure H The assertion F h F' <> is 
checked in two stages: The first stage establishes the relations F, '.imp, < and 
“^wdn for the complete environment F and establishes that F and < are acyclic; 
if this is the case, then the second stage establishes that the declarations in F' 
are well- formed one by one, according to the rules in this section. 

Not surprisingly, the empty environment is well-formed, c.f. the first rule in 
figureH 

We need the notion of definition table lookup, i.e. F(ld), which returns the 
definition of the identifier Id in F, if it has one. 

Definition 2 For an environment F, with unique definitions for every identi- 
fier, define F(id) as follows: 

-F(x) = T ijf F = F',x:T,F" 

- F(C) = C ext C' impl li, ...!„ {vi : Ti, ...v^ : Tm, mi : MTi, ...mi, : MTk} ijf 
F = F',CextC' impl li, ...In {vi : Ti, ...Vm : Tm, mi : MTi, ...m^ : MTi,}, F" 

- F(l) = 1 ext li, ...ln{mi : MTi, ...mi, : MTi,} ijf 
F = F", 1 ext li, ...ln{mi : MTi, ...mi, : MTi,}, F" 

- F(id) = Undef otherwise 
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Furthermore, Classes{F) and Interfaces{F) contains the identifiers of all classes 
or interfaces declared in F, i.e. 

— C G Classes(F) iff T h C C C. 

— I G Interfaces(F) iff F \- 1 < 1. 

A variable should be declared to have a variable type and it should be de- 
clared only once, c.f the second rule in figure^ The type declaration for T may 
follow textually that of the variable x, as for example in: 

Ax; ... class A .... 

We now consider when class declarations are well- formed. For this we shall 
need several auxiliary concepts. The following auxiliary definition allows the 
extraction of the argument types and the result type from a method type and 
helps us describe restrictions imposed on variable and method definitions for 
classes or interfaces, given in chapters 8.2 and 9 in 

Definition 3 For a method type MT= Ti x ... x Tn ^ T, we define the argument 
types and the result type.' 

— Ar5s(MT) = Ti X ... X Tn 

— i?es(MT) = T 

Next we introduce some functions to find the class components: 

— FDec{F, C, v) indicates the nearest superclass of C (possibly C itself) which 
contains a declaration of the instance variable v and its declared type; 

— FDecs{F, C, v) indicates all the field declarations for v, which were declared 
in a superclass of C, and possibly hidden by C, or another superclass. 

— MDecs{F, C,m) indicates all method declarations {i.e. both the class of the 
declaration and the signature) for method m in class C, or inherited from one 
of its superclasses, and not hidden by any of its superclasses; 

— MSigs{F, C, m) returns all signatures for method m in class C, or inherited and 
not hidden by any of its superclasses. 

Note that shadowed variables are treated differently from overridden meth- 
ods. Namely, shadowed variables are part of the set FDecs, whereas overridden 
methods are not part of the set MDecs. The reason for the difference is that 
shadowed variables need to be stored in the objects of subclasses {e.g. a FrPhil 
object contains a like field inherited from the class Phil, even though this field 
is shadowed in FrPhil), whereas overridden methods are never called by objects 
of the subclasses {e.g. for FrPhil objects the only think method with a Phil 
argument is that from FrPhil, whereas that defined in Phil is of no interest to 
FrPhil objects). 

From now on, we implicitly expect F to have unique declarations and the 
relations T and < to be acyclic up to reflexivity. Thus the functions FDec, FDecs, 
MDecs and MSigs are well-defined, c.f. 
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Definition 4 For an environment F , with a class declaration for C, i.e. 

F = r',CextC' impl : Ti,...Vu : T^, mi : : MTi}, F" , 

define: 

— FDec{F, Object, v) = Undef for any v 

FDec{F,C,v) = iff v = Vj 

FDec{F, C, v) = FDec{F, C', v) iff ^ 7 ^ Vj Vj G 

— FDecs{F, Object, v) — 0 

FDecs{F, C, v) = {{C , T') \ (C", T') = FDec{F, C, v)} U FDecs{F, C', v) 

— MDecs{F, Object, m) = 0 

MDecs{F, C,m) = { (C,MTj) | m = } U 

{ (C",MT") I (C",MT") G MDecs(r,C',m), and 
Vj G {I...I} : m = fflj Args{KTj) Args{KT”) } 

— MSigs{F, C,m) = { MT | 3C" with (C",MT) G MDecs{F, C,m) } 

The sets FDecs{F, Object, v) and MDecs{F, Object, m) should contain the enti- 
ties described in chapter 20.1 of We defined them as empty sets for simplic- 
ity. 

For the philosophers example the above functions are: 

FHec(/o, Phil, like) = (Phil, Truth) 

FDec(/o, FrPhil, like) = (FrPhil,Food) 

FDecs(/b, Phil, like) = { (Phil , Truth) } 

FDecs(/b, FrPhil, like) ={ (Phil , Truth) , (FrPhil,Food)} 

MDecs{Fo, Phil, think) = { (Phil , Phil-^Phil) , (Phil , FrPhil-^Book) } 
MDecs^Fo, FrPhil, think) = { (FrPhil ,Phil-^Phil) , (Phil.FrPhil^ook)} 
M5'igs(/b, Phil, think) = {Phil^Phil, FrPhil— >Book} 

Similar to classes, we introduce the following functions to look up the in- 
terface components: MDecs{F, l,m) contains all method declarations {i.e. the 
interface of the declaration and the signature) for method m in interface 1, or 
inherited - and not hidden - from any of its superinterfaces; MSigs{F, l,m) re- 
turns all signatures for method m in interface 1, or inherited - and not hidden - 
from a superinterface. 

Definition 5 For an environment F, containing an interface declaration for 1, 
i.e. F = F' , 1 ext li, ...ln{mi : MTi, ...mjc : MTic}, F" , we define: 

— MDecs{F, l,m) = {(l,MTj) | m = mj}U 

{ (l',MT') I 3jG{l...n}: (l', MT') G MDecs(T, lj,m) 

and ViG{l...k} m = mi Args{WY') ArgsiWlf) } 

— MSigs{F, l,m) = { MT | 3l' : (l',MT) G MDecs{F, l,m) } 

The following lemma says that if a type T inherits a method signature from 
another type T' i.e. if (T',MT) G MDecs(T, T, m), then T' is either a class or an 
interface exporting that method and no other superclass of T, which is a subclass 
of T' exports a method with the same identifier and argument types. Also, if a 
class C inherits a field declaration for v, then there exists a C', a superclass of C 
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r\- r' o 

r \~ T ^VarType 

r'(x)=Undef r'rro 

rh e o rh r',x : T o TVo 

n>0, k>0, 1>0 

rh r' o 

r'(C) = Undef 
NOT r I- c' c c 
r I- c' c c' 

T P Tj ‘^VarType 

_r h MTj <>MethType 

Vi=Vj i = j j,ie{l...k} 

nii = mj => i — j or Args(MTi) 7^ Arps (MTj) 

Vje{l...l}MT e MSigs{r,C' ,mj), Args{m) = Arps(MTj) => 
i?es(MTj) = i?es(MT) 

Vm, Vj e AT ^ T e MS'ips(-T, I j,m) 

3T' with AT ^ T' € MSigs{r, C, m), T h T' <^du T 
r r' ,C ext C' impl Ii, ...In{vi : Ti, ...Vi; : Tii,mi : MTi, ...mi : MTi}0 

n > 0,1 > 0 

rh r' o 

r'(l) = Undef 

NOT n- Ii < I j€{l...n} 

_r h MTj OuethType 

nii = nij => i = j or Arps(MTi) 7 ^ Arps(MTj) 
ie{l...n}, MT e MS'ips(r, Arps(MT) = Arps (MTj) 

=> i?es(MTj) = _Res(MT) 

Vi, je{l...n} MTi e MSigs{r,h,m), MT 2 £ MSigs{T, l 2 ,m) : 
Arps(MTi) : Arps(MT 2 ) => i?es(MTi) = i?es(MT 2 ) 
n- r', I ext Ii, mi : MTi, ...mi : MTi} O 



Fig. 7. Well-formed environments 



which contains the declaration of v. This lemma is needed later in the subject 
reduction theorem when proving that there exists a redex in any well-typed 
non-ground term. 

Lemma 1 For any environment F , types T, T' and identifiers v and m: 

- (T',MT) G MDecs(T,T,m) ^ 

• T h T C T' and F{T') = T' ext ... impl ...{...m : MT...} and 
VT", C yf T' with ThCCT', ThTCC: 

F{C)f:C ext ... impl ...{...m :Arps(MT)^T"} 



or 
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• 7^ h T < T' and r{T') = T' ext ...{...m : MT...} and VT", I ^ T' with 
r\-l<l'r\-T<l: r(l)^l ext ...{...m : ^ros(MT)->T"| 

— FDec(r,C,v) = (C',T') ^ 

r(C') = C'...{...v : T...} and Th C C C' and VT',C"^C' 

with 7^ h C C C", 7^ h C" C C' : r(C") ^ C" ext ...impl...{...v : T"} 

The language description imposes the following requirements, when a new 
class C is declared as 

C ext C' impl Ii, ...In{vi : Ti, ...v^ : Tic,mi : MTi, ...mi : MTi} 

— there can be sequences of superinterfaces, instance variable declarations, and 
instance method declarations; 

— the previous declarations are well-formed; 

— there is no prior declaration of C 

— there are no cyclic subclass dependencies between C' and C 

— the declarations of the class C', interfaces Ij and variable types Tj may 
precede or follow the declaration for C - this is why we require 7^ h C' C C', 
rather than 7^' h C' C C'; 

— the MTj are method types; 

— instance variable identifiers are unique; 

— instance methods with the same identifier must have different argument 
types; 

— a method overriding an inherited method must have the same result type as 
the overridden method; 

— “unless a class is abstract, the declarations of methods defined in each direct 
superinterface must be implemented either by a declaration in this class, or 
by an existing method declaration inherited from a superclass” . 

These requirements are formalized in the fourth rule in figure^ Similar require- 
ments for interfaces are given in ^3, and their formalization is also given in the 
fifth rule in figure^ 

3.6 Properties of Well-Formed Environments 

It is straightforward to state and prove the following properties of well-formed 
environments: Two types that are in the subclass relationship are classes, O is 
reflexive, transitive and antisymmetric, and the subclass hierarchy forms a tree. 
Also, two types that are in the subinterface relationship are interfaces, and < is 
transitive, reflexive and antisymmetric. Unlike U, < does not form a tree. 

Widening is reflexive, transitive and antisymmetric. If an interface widens 
to another type, then the second type is a superinterface of the first. If a type 
widens to a class, then the type is a subclass of that class. If a class widens 
to an interface I, then the class implements a subinterface of I. If an interface 
widens to another type, then the interface is identical to the type, or one of its 
immediate superinterfaces is a subinterface of that type. 

Finally, the following lemma states that if a type T widens to another type 
T', and T' has a method m, then there exists in T a unique method m with the 
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same argument types, and whose return type is the same as that of the method 
from T'. 

Lemma 2 If F \- O, L h T <wdn 'I' , then MSigs{F, T,m) C MSigs{F, T',m) 
From now on we implicitly assume that all environments are well-formed. 

4 Javage, Enriching Java^ 

Javase is an enriched version of Javag which provides compile-time type informa- 
tion necessary at run-time. It is a subset of Java/{ from and corresponds to 
Java/ig/it from^J. The syntax of Javase programs is described in figure H The 
process of enriching Javag terms is described by the mapping C: 

C : Environment x Javas — > Javase 



Program : 


:= ( ClassBody )* 


ClassBody : 


Classid ext ClassName {( MethBody )*} 


MethBody : 


Methid is (A Parld : VarType.)* {Stmts ; return [Expr] } 


Stmts : 


:= e 1 Stmts ; Stmt 


Stmt : 


~ if Expr then Stmts else Stmts 


1 


Var := Expr \ Expr.Meth'NarRe{Expr*) \ throw Expr 


1 


try Stmts (catch ClassName Id Stmts)* finally Stmts 


1 


try Stmts (catch ClassName Id Stmts)'^ 


Type : 


:= VarType | void nil | ClassName-Thrn 


Expr : 


:= Value \ Var 


1 


Expr. [ArgType]nethAame{Expr*) 


1 


new ClassName ^(VarName ClassName Value)*^ 


! 


new SimpleType ( LExprl )’'■( [] )*[FaZMe] 


Var : 


Name | Var\_Expr'\ \ this 


1 


Far. [ClassNameJVarName 


Value : 


:= PrimValue \ null 


PrimValue : 


:= intValue \ charValue \ byteValue \ ... 


VarType : 


:= SimpleType \ ArrayType 


SimpleType : 


:= PrimType \ ClassName | Interf aceName 


Array Type : 


:= SimpleTypel ] | ArrayTypel ] 


PrimType : 


bool 1 char | int | ... 



Fig. 8. Javase programs 



Javase can be obtained from Javas by applying enrichments in four cases. 
Method calls are enriched by the signature of the most special applicable method 
available at compile-time. Thus, the Javas syntax Expr .MethNameiExpr*) , is 
replaced in Javase by the syntax Expr. [^.r^Tj/pe] MethName (Fxpr*) . Instance 
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variable accesses are enriched by the class containing the field declaration. Thus, 
Expr. VarName is replaced in Javase by Expr. [ClassName] VarName. Object cre- 
ation is enriched by the names of all its fields, the classes they were declared in 
and their initial, default values. Therefore, the Javag syntax new ClassName is re- 
placed by the Javase syntax new ClassName <C(VarName ClassName Value)*^. 
Finally, array creation is enriched by the initial values to be stored in each com- 
ponent of the new array. Therefore, the Javag syntax new SimpleType ([ Expr 
])^([ ])* is replaced in Javase by new SimpleType {[Expr'])^ {['])* \Value\. 
Examples of enriching of method call, of instance variable access and of object 
creation can be seen in section^J The Javas array creation new int [3] would 
be represented in Javase as new int [3] |0]. 



4.1 The Example in Javase 

The program ps from section ^would be mapped to the Javase program pse: 

Pse “ Ps} ~ 

Phil ext Object! 

think is Ay:Phil . {...} 
think is A y:FrPhil . {...} 

} 

FrPhil ext Phil { 

think is Ay:Phil. 

{ this. [FrPhiljlike :=oyster; ...} 

} 

The terms would be represented as: 

pascal := new FrPhil like Phil nil, like FrPhil nil ^ 

aPhil. [Phil] like ... 

aPhil. [Phil] think (aPhil) 

aPhil . [FrPhil] think (pascal) ... 

pascal . [FrPhil] like ... 

pascal . think (pascal) !! ambiguous call 

pascal . [Phil] think (aPhil) ... 

5 JavHs Types 

The type rules for Javas are given in figuresH^J and^J They correspond to 
the type checking phase of a Java compiler and have the form E \- t : T, which 
means that term t has the type T in the environment E. The assertion T h p O 
signifies that program p is well-formed under the environment E {i.e. that all 
expression are type correct, and that all classes conform to their definitions), 
whereas E \- p <S> signifies that p is complete, {i.e. well-formed, and it provides 
a class body for each class declared in T). 
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In parallel with type checking, Javag terms are enriched with type infor- 
mation Thus, each type rule is followed by a an enrichment equation of the 
form C{r,f} = t' meaning that the Javas term t is enriched to the equivalent 
Javase term t'. The enrichment rules are given together with the type rules 
because in some cases {i.e. for method call and field access) the enrichments use 
type information. 

Figure ^describes the types for variables, primitive values, null, statements, 
newly created objects and arrays, and field and array access. 

According to the first rule, character literals have character type, integer 
literals have integer type etc. According to the second rule, a statement sequence 
has the same type as its last statement. A return statement has void type, or the 
same type as the expression it returns. An expression of type T' can be assigned 
to a variable of a type T if T' can be widened to T. A conditional consists of two 
statement sequences not necessarily of the same type. 

For a class C, the expression new C has type C. For a simple type T, the 
expression new T[ei]...[en][]^...[]jj is a n+k-dimensional array of elements of type 
T. Array and object creation expressions are enriched with initialization infor- 
mation that determine the values for component initialization. Initial values are 
defined in ch. 4.5.5. of here in the following definition: 

Definition 6 The initial value of a simple type is: 

— 0 is the initial value of int 

— is the initial value of char 

— false is the initial value o/bool 

— null is the initial value of classes, interfaces or nil 

For an array access v[e], the variable v should have an array type T[], and e 
should be of integer type. For a field access v . f , the variable v should have a class 
type T, (because so far we only consider non-static fields, in Javag only instances 
have fields) one of whose superclasses (C) should contain a field declaration for 
f of type T', i.e. FDec{T, T, f ) = (C, T'), in which case the field access expression 
has type T', and the information from which superclass the field declaration 
is inherited is stored in the corresponding Javase expression, i.e. C{T,v.f} = 
C{r,v}.[c]f. 

FigureHalso contains the type rules for method bodies and method calls, as 
in ch. 15.11, A method is applicable if the actual parameter types can be 
widened to the corresponding formal parameter types. A signature is more special 
than another signature, if and only if it is defined in a subclass or subinterface and 
all argument types can be widened to the argument types of the second signature; 
this defines a partial order. The most special signatures are the minima of the 
“more special” partial order. 

Definition 7 For an environment F, identifier m, variable types T, Ti, ... T^, 
the most special declarations are defined as follows: 

— ApplMeths{F,m,l,li X ... x ?„) = {(T',MT') | (T',MT') e MDecs{F,T,m) 

and MT' = T'^ x ... x and T h Ti <wdn T'i for iG{l...n}} 
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i is integer, c is character, x is identifier 
r h null : nil, F h true : bool, F h false : bool, 

_T h i : int, _T h c : char, F x : F(x) 

C{r,z} — z if z is integer, character, identifier, null, true, or false 
_T h e : bool 

F h stmts : void F h stmts^ : void F h stmt : 

F h stmts ; stmt : T' 

C{r, stmts ; stmt} = C{F, stmts} ; C{F, stmt} 

_T h if e then stmts else stmts^ : void 
C{F, if e then stmts else stmts^} = 

ifC{F,e} then C{F, stmts} else C{F, stmts^} 



r h V : T 
The : T' 

F\-t <^dn T r h e : T 

_T h V := e : void F h return e : T 

C{F, V :=e} = C{F, u} C{F, e} C{F, return e} = return C{F, e} 



n- C C CVf , C', T' with (C', T') G FDecs{F, C, f ) : 
3ie{l...n} : fi = f,Ci = C',Ti = T' 

Vi initial for Ti i€{l...n} 

F h return : void F h new C : C 

C{F, return} = return C{F,new C} = new C^fi Ci vi, ...fn Cn Vnf^ 



F\-T O 

V arType ; NOT r h T < T 

_T h ei : int i G {l...n}, n > 1, k > 0 
V is initial for T 
r h new T[ei]...[en][]i...Qi, : T[] 

1 ■ ■ ■ [In+k 

C{F,new T[ei]...[en]Qi...Dfc} = 

new T[C{r,ei}]...[C{r,e4][],...DJv] 



rhv : T[] 

_T h e : int 

C{r,uM}=C{F,u}[C{F,e}] 



r h V : T 

FDec{F,7,f) = (C,T') 
Eh v.f : T' 

C{F,u./} =C{r, v}.[C]f 



F e± : Ti i G {l...n}, n > 1 
MostSpec{F , x ... x Tn) = {(T,MT)} 
F h ei.m(e 2 ...en) : i?es(MT) 

C{F, ei.m(e2...en)} = 

C{F, ei}.[Arffs(MT)]m(C{F, 62}...C{F, e„}) 



Fig. 9. Types for Java^ expressions and statements 
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— (T,Ti X ... X Tn — + Tn+i) is more special than (T',T^ x ... x ^ Tn+i) 
iff r^T<^dnff anrfViG{l...n} ThTi <wn Tf 

- MostSpec{r,m,T,rt x ... x !„) = {(T',MT') | 

(T',MT') G ApplMeths{r,m,T,Ti x ... x Tn) and 
i/(T",MT") G ApplMeths{r,m,T,Ti x ... x T^) 
and (T",MT") is more special than (T',MT') 
then T" = T' and MT' = MT"} 

The signatures of the more specific applicable methods are contained in the 
set MostSpec. A message expression is type-correct when this set contains exactly 
one pair. The argument types of the signature of this pair is stored as the method 
descriptor, c.f. ch.15.11 in Q, and the result type of the signature is the type 
of the message expression. 

Figure^Jdescribes the types for program, method or class bodies. The first 
rule describes the type of a method body with parameters xi, ..., Xn, consisting 
of the statements stmts. The renaming of variables in the method body, namely 
stmts[zi/xi, ..., Zn/xn], is necessary in order to avoid name clashes and, also, in 
order for lemma H to hold - as pointed out in It is worth noticing that 
the rules describing method bodies do not determine T - instead, the expected 
return type of the method, T, is taken from the environment F when applying 
the next rule of the figure, which describes class bodies. 

The second rule in figure describes the type of a class body consisting 
of method bodies mBodyi, ... mBodyn. Note that each mBodyi is type checked 
in the environment F, this : C, which does not contain the instance variable 
declarations vi : Ti..., V]j: T]j. Thus, through the type system, we force the use 
of the expression this.Vj as opposed to Vj. 

A program p = cBodyi, ...cBodyn is well- formed, «.e. T h p O, ifit contains no 
more than one class body for each identifier, and if all class bodies, cBodyi, are 
well-typed and satisfy their declarations. Furthermore, each class is transformed 
by C. Finally, as described in the last rule of figure^J a program is complete, if 
it is well formed, and it provides a class body for each of the classes declared in 
the environment F. This is indicated by T h p <S>| 

The following two functions will be needed for the operational semantics. In 
a class body cBody the function MethBody{m, AT, cBody) finds the method body 
with identifier m and argument types AT, if it exists. From the requirements for 
classes in figure^] it follows that for a well- formed environment F, the function 
MethBody (m, AJ, cBodj) returns either an empty set or a set with one element. 
In a program p the function MethBody(m, FI,C,p) finds the method body with 
identifier m and argument types AT, in the nearest superclass of class C - if it 
exists. It returns a single pair consisting of the class with the appropriate method 
body, and the method body itself or the empty set if none exists. 

Definition 8 Given a class body cBody = C ext C' {mBodyi, ...mBodyn}, argu- 
ment types AT, and a program p, we define method look up as follows: 

^ Notice, that in previous work, we did not distinguish between well-formed and com- 
plete, and the assertion F h pO signified both. 
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mBody = m is Axi : Ti...Axn : Tn. {stmts} 
Xi 7 ^ this i€{l...n} 

zi, ...,Zn are new variables in F 
StmtS^ = Stmts[zi/xi, ...,Zn/Xn] 

F, zi : Ti...Zn : Tn h stmts' : T' 
r h T' T 


F h mBody : Ti X ... X Tn ^ T 

C{r, mBodyj = m is Axi : Ti...Axn : Tn. 


(C{r, stmts}} 


n, k, ml > 0, T h O 




F(C) — C ext C' impl Ii...In{vi : Ti...vij : Tij,mi : MTi...mi : MTi} 


cBody = C ext C' {mBodyi, ...mBodyi} 
T(this : C) = Undef 
mBodyi = mi is mPrsStSi iG{l...l} 

r, this : C h mBodyi : MTi i€{l...l} 




F h cBody : F(C) 




C{r, cBody} — C ext C' |C{r,this : C, mBodyi}. ..C{F, this : C, mBodyi}} 


n 0i P — cBodyi, ...cBodyn 
cBodyi = C ext ..., cBodyj = C ext ... 






Classes[F) = Classes}^) 


T h cBodyi O i£{l...n} 


r h p O 


rhp O 

C{r,p} = C{r, cBodyi}...C{r, cBody^ 


r h p ^ 



Fig. 10. Types for JavUs method bodies, class bodies, and program bodies 



— MethBody{si, AT, cBody) = { mBodyj | mBodyj = m is Axi : Ti... Axj^ : 

and AT = Ti X ... x Tjt } 

— MethBody (m, hJ, Object, Tp) = il) 

— Met/i5ody(m, AT, C, p) = (C, mBody) ijf 

MethBody(m, AT, cBody) = {mBody}, where cBody = p(C) 

— MethBody(m, AT, C, p) = MethBodyim, AT, C', p) iff 

MethBodyim, AT, cBody) = 0, where p(C) = C ext C' ... 

In figure^Jwe define the typing rules for exceptions. A throw statement has 
the type void if the expression following the throw indicates an exception. We re- 
quire the expression not to be an address. For addresses the rules for Javase found 
in figure apply. Similarly the try ... catch ... finally statements have the 
type void, provided that the constituent statement lists are well-typed, and that 
the names of exception classes and new variables appear after each catch. The 
additional Java requirements, that no class Ei should appear more than once, 
and that no class should appear preceded by a subclass are expressed in but 
are omitted here, since they do not affect the subject reduction property. 
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r \- e : E, e 7 ^ ti 

h E ^ Exception 
r h throw e : void 
C{r, throw e} = throw C{T, e} 

n > 0, VijZi new in F iG{l...n} 

F h Ei C Exception iG{l...n} 

F, Zi : Ei h stmtSi[zi/vi] : void iG{l...n} 
F h stmtSn+i : void 




F h try stmtso catch Ei vi stmtsi ... catch En Vn stmtSn : void 
F h try stmtso catch Ei vi stmtsi ... catch En Vn stmtSn 


finally stmtSn+i : 

C{r,try stmtso catch Ei vi stmtsi ... catch En Vn stmtSn} 

= try C{r, stmtso} catch Ei vi C{r,stmtsi} ... 
catch En Vn C{T,stmtSn'i 


void 


C{r,try stmtsocatch EiVistmtsi... catch En Vn stmtSn finally 
= try C{r, stmtso} catch Ei vi C{r,stmtsi} ... 
catch En Vn C{r, stmtSn}f inally C{r, stmtSn+i} 


StintSn+lJ’ 



Fig. 11. Java^ types for exceptions 



5.1 Properties of the Java^ Type System 

The following lemma says that the Java^ type system is deterministic, and that 
in a complete Java^ program any class that widens to a superclass or superin- 
terface provides an implementation for each method exported by the superclass 
or superinterface. 

Lemma 3 For any well-formed environment F , variable types T, Ti, ...,Tn, Tn+i, 
class C, Javas program p, with F b p 

-If p b C C C' then F b C <^dn C' 

Furthermore, if 

F b C Fiwdn T 

- Ti X ...Tn ^ Tn+i G MSigs{F,T,m) 
then 3T^+i,C': 

- (C',Ti X ...Tn ^ Tn+i) G MDecs{F, C,m), and F b C C C' 

- Met/iFod2/(m, Ti X ...Tn, p, C)= (O', Axi : Ti, ...Axn : Tn.{stmts}) and 

F, this : C', Xi : Ti, ...Xn : Tn b stmts : T^^^ and F b Tj^+i <„,dn Tn+i 

5.2 Absence of the Subsumption Rule 

The subsumption rule says that any expression of type T also has type T' if T is 
a subtype of Tb In the case of Java, where subtypes are expressed by the <wdn 
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r V : T 

r h T c 

FDec[r,C, f) = ( C,T') 
r be v.[c|f : T' 


r he ei : tJ iG{l...n},n > 0 

ri-T;<„d„Ti lG{2...n} 

FirstFit{F, X ... X T„) = {(T,MT)} 

^ hse ©I - ["^2 X ... X Tq Jni(e 2 ...en) : i^es(MT) 


r h c c c 


n > 1, k > 0 


Vf,c',T' with (c',t') G FDecs(r,C,t) : 


V \- T '^VarTypei NOT i'' h T T 


3iG {l...n} : fi = f, Ci = C', Ti = T' 


^ hse : int iG{l...n} 


r Vi : TiiG{l...n} 


r hse V : T 


The new C<fi Ci Vi,...fn Vn> : C 


r he new T|eiJ...|e„J|J,...|J,^|lv]| : T|J,...|J„^„ 



Fig. 12. Differences between JavUse types and Javas types 



relation, it would have had the form: 

F h e : T 
^ T <^dn T' 

F h e : T' 

The type system introduced in this paper does not obey the subsumption 
rule. For instance, the type of aPhil . like is Phil, but the type of pascal . like 
is Food, though Fq F aPhil : Phil, Fq F pascal : FrPhil, and Fq F 

FrPhil <wdn Phil. In fact, introduction of the subsumption rule would make 
this type system non-deterministic - although [J] develops a system for Java 
which has a subsumption rule, and in which the types of method call and field 
access are determined by using the minimal types of the expressions. 



6 Extending the Type Rules to JavUge 

After giving types to Javag terms, we also give types to Javase terms. However, 
the rationale for typing the two languages is different: Javas typing corresponds 
to typing performed by a Java compiler, and it determines whether a term is 
well-formed. Javase typing, on the other hand, does not correspond to type 
checking actually performed, it is needed in order to express the subject reduction 
theorem. A Javase term that has emerged by enriching a well-typed Javas term 
will be well-typed too, and will have the same type as the latter, c.f. lemmaH 
Therefore, the Javase type rules correspond to Javas type rules, except where 
the expressions have different syntax. 

Figure ^3contains the four cases where Javase syntax differs from that of 
Java,., and therefore, where Javase types differ from Javas types. The assertion 
F Fse t : T signifies that the Javase term t has type T in the Javase type system. 
Thus, we use the subscript se to distinguish between type systems. 

The first rule describes field access. The difference between the type of a field 
access expression in Javas and Javase is, that in Javase the type depends on the 
descriptor {i.e. C) instead of the type of the variable on the left of the field access 
{i.e. T). 
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In the second rule we consider Javase method calls: we search for appropriate 
methods using the descriptor signature, (T 2 x ... x Tn), instead of the types of 
the actual expressions, (Tj, For this search we first examine the class of 
the receiver expression for a method body with appropriate argument types, and 
then its superclasses: 

Definition 9 For environment F , identifier m, type Ti, argument types AT, we 
define: 

FirstFit(F,m,Ti, ft) = 

{(T,MT) I (T,MT) e MDecs{F, Ti,m) and (MT) = AT} 

The last two rules describe object and array creation. The requirements are 
the same as those for Javas, except that we additionally require the initialization 
values to be of the appropriate type. 

6.1 Properties of the Javase Type System 

The following lemma states that no more than one signature with argument 
types AT can be found for a type T. This signature will always be found in a 
superclass or superinterface of T. Also, once such a signature is found, the same 
signature can be found for any subclass or subinterface of T. 

Lemma 4 For a well-formed environment F , types T, T' , T" , and argument 
types AT; 

- card{FirstFit{F, m,T, ft)) < 1 

- 3MT : FirstFit{F, m, T, AT) = {(T', MT)| T h T T' 

- 3MT : FirstFit{F, m, T, AT) = |(T', MT)} and F h T" <^dn T 

3T'" : FirstFit{F, m, T', AT) = (T'", MT) and F h T'" <^dn T' 

Not surprisingly, a well-typed Javas expression of type T is enriched into a 
Javase expression which has the type T as well. 

Lemma 5 For types T, T', environment F , Javas term t: 

Tht : T ^ ThseC{r,t} : T 

7 Java^, the Run Time Language 

As we said in the previous section, Javase is an enriched version of Javas, enriched 
with compile-time type information necessary at run-time. However, at run time, 
new terms may be reached, whose syntax is not covered by Javase- For this, we 
further extend Javase, to obtain Java,-, the run time language. Java^ is a pure 
superset of Javase, it corresponds to Don Syme’s Java/j from [3, with the 
difference that Java^ also allows for exceptions. Java^ is a superset of Java/ig;,t 
from because Javaught does not describe additional artifacts that may arise 
at run-time only. 
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Program 


:= ( ClassBody )* 


ClassBody 


:= Classid ext ClassName {( MethBody )*} 


MethBody : 


:= Methid is (A Parld : VarType.)* {Stmts ; return [Expr] } 


Stmts 


:= e 1 Stmts ; Stmt 


Stmt : 


:= if Expr then Stmts else Stmts 


1 


Var : = Expr \ Ea:pr.MethName(l?xpr*) | throw Expr 


1 


try Stmts (catch ClassName Id Stmts)* finally Stmts 


1 


try Stmts (catch ClassName Id Stmts)'^ 


Type 


:= VarType \ void | nil j ClassName-Thrn 


Expr : 


:= Value \ Var 


1 


Expr. \ArgType\V[et'tiAaisie{Expr*) 


1 


new ClassName <C(VarName ClassName Value)*^ 


1 


new SimpleType ( [Exprl )'^( [] )*| Fa/we] 


1 


Stmts 


Var 


:= Name | VarlExprl \ this 


\ 


Far. [ClassNameJVarName 


1 


ii. [ClassName] VarName j LilExprl i an integer 


1 


null.[ClassName]VarName j null [i?a:pr] 


Value 


:= PrimValue j null j RefValue 


RefValue : 


■.= ii i an integer 


PrimValue 


:= intValue j eharValue j byteValue j ... 


VarType 


:= SimpleType j ArrayType 


SimpleType 


:= PrimType j ClassName j Interf aceName 


ArrayType 


:= SimpleTypel ] j ArrayTypel ] 


PrimType : 


:= bool 1 char j int | ... 



Fig. 13. JavUr programs 



These additional artifacts that may arise at run-time and are not part of 
Javase, but are part of Java^ arise through addresses, the null value, and state- 
ments as expressions. Addresses have the form they represent references 
to objects and arrays, and may appear wherever a value is expected, as well 
as in array and field accesses. Therefore, Java^e variables may have the form 
ri.[ClassNELme]VarNELme, or ii\Expr], and expressions may have the form l ±. An 
access to null may arise during evaluation of array or field access variables, 
therefore Java^e expressions may have the form null.[ClassName]VarNaine, or 
null[i?a;pr]. Furthermore, in order to describe method evaluation through in- 
line expansion rather than closures and stacks, in Java^e we allow an expression 
to consist of a sequence of statements, so that in the operational semantics a 
method call can be rewritten to a statement sequence. 

7.1 Extending the Type Rnles to Java^ 

As stated in the previous section, we gave types to Java^e terms, in order to be 
able to formulate a subject reduction theorem. We shall now have to extend these 
to cover the types of Java^. The Java^ type rules correspond to Java^e type rules. 
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cr(ti) = 


cr(ti) = 


rh T <wdn Object 


r, a hr Li : C 


r, a hr 


Li : T[]j...[]^ r, crHnull : T 


r, (T H v[e] : T 
r, ahr e' : T' 
T, T' / E-Thrn 




V 7^ v'[e'] for any v', e' 
r, a hr V : T 
r,a hr e : T^ 

F h T' <„dn T 

r, a hr V := e : void 


r, a hr V [e] : = e 


: void 


F h E C Except 


ion 


cont is a context 


cr(ti) = <...»^ 




F, (T hr t : E-Thrn 


r, a hr throw 


: E-Thrn 


r, a hr contCtD : E-Thrn 



Fig. 14. Difference between Java^ and Javase types 



except where Java,, introduces new syntax, or, where necessities of the subject 
reduction theorem proof require otherwise. 

The type of an address (ti) depends on the object or array pointed at in 
the current state a (states are introduced in section Hi therefore, the type of 
a Javar term depends on both the environment and the state, and this is why 
type assertions for Java^ terms t have the form F, cr t : T. Again, we use a 
subscript in order to distinguish between the three type systems in our approach. 

Figure^Jcontains the seven cases where Java^ types differ from Javase types. 
The reasons for the differences can be classified into three categories. Firstly, 
those that give types to expressions that may only arise during program execu- 
tion but do not involve exceptions (he. the rules for addresses and for null). 
Secondly, those that give types to terms enclosing a thrown exception (the last 
two rules ). Thirdly, those that give types to terms that would be type incor- 
rect in Javas(he. typing of assignments, and the rules for null). The rules in 
the first category give the same type as that given if the address or null were 
replaced by an identifier of an appropriate class or array type. The rules in the 
second category make type-correct terms which would have been type-incorrect 
in Javase. However, the evaluation of such terms does not corrupt the integrity 
of the system, since the operational semantics requires run-time checks to be 
performed, and exceptions to be thrown, if certain conditions are not satisfied. 
The rules in the third category involve the type E-Thrn, a type which was not 
available in Javase, or Javas. 

We now discuss these seven rules in more detail. 

The first two rules in figure^] describe the types of addresses. If an object 
is stored at address l±, i.e. a{L±) = <C...S>‘^, then its class, C, is the type of ri. If 
a k-dimensional array of T is stored at such an address, i.e. 
then T[]^...[|j. is the type of this reference. 
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The third rule says that null has any reference type. This rule is required 
in order to be able to give a type to terms like null [j -4], which, although 
type incorrect in Javas, may arise during execution of Java^ terms. Such terms 
ultimately lead to exceptions, but they do not immediately raise the exception 
NullPE, because the Java semantics requires other parts of the expression to be 
evaluated first - in our example, j-4 has to be evaluated first. In order to be 
able to prove the subject reduction theorem, such expressions need a type. The 
effect of this rule is, that Java,, terms do not have unique types. 

The fifth and sixth rule describe assignments. The Java,, array assignment 
rule, suggested to us by Don Syme only requires the left hand side 

and the right hand side to be type-correct. It is weaker than the corresponding 
assignment type rule in Javas, or Javase: it does not require the right hand side 
to be of a type that can be widened to that of the left hand side. The reason for 
this weaker requirement is, that the type of an array component may become 
narrower during evaluation. For example, if z is a one dimensional array of 
Phil, then the assignment z [3] :=aPhil is type-correct. However, if at run-time 
z happens to contain a reference to an array of FrPhil, i. e. cr(z) = and = 
then z [3] :=aPhil will be rewritten to [3] := aPhil. Should this 
term be considered type correct? A term y[3] aPhil would be type incorrect 
if y was declared as an array of FrPhil. On the other, hand the evaluation of the 
term ti[3] := aPhil will not stop here. The right hand hand side, in that case 
aPhil, will be evaluated, and if it returns a value which is of a subclass of FrPhil 
then the assignment will be performed, otherwise an exception will be thrown. 
Therefore, in order to be able to prove subject reduction, the intermediate term 
(•i[3] := aPhil has to be considered type correct in Java,.. Interestingly, such a 
distinction between types for array assignments and other assignments is not 
necessary when using large steps operational semantics 

Finally, the last two rules in figure^Jdeal with exceptions that have actually 
been thrown. Namely, the term throw new indicates potential throwing of 

an exception, and would be rewritten to the term throw ti, where ti is the 
address of an object of class E. The latter term indicates an exception which has 
actually been thrown, and, according to the rules, it has the type E-Thrn. The 
context of an exception, defined in figure^J encompasses all enclosing terms up 
to the nearest enclosing try... catch close, i.e. up to the first possible position 
at which the exception might be handled. According to the last rule in figure 
the type of a term which is a context for a thrown exception of class E is E-Thrn. 
This rule allows the typing of a message expression one of whose arguments 
threw an exception, assignments whose left hand or right hand side threw an 
exception, etc. 
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Context 

VarCont 

ExprCont 



StmtCont 



ExprCont \ VarCont \ StmtCont 

VarCont. [ClassName] VaxName | VarCont \_Expr'\ 

Var lExprContl | C • □ 

new VarType [fepr] i ... [feprCont] fc... [fepr] „ 

ExprCont. [^rjTypelMethName (.Expri, . . .Exprn) 

Expr. [ArgTj/pe] MethName (.Expri, . . . ExprContk, . . .Exprn) 
where n>l,l<fc<n 

VarCont := Expr \ Var := ExprCont 

if ExprCont then Stmts else Stmts 

StmtCont ; Stmt \ return ExprCont \ throw ExprCont 



Fig. 15. Java,, exception contexts 



7.2 Properties of the Java^ Type System 

Trivially, any well-typed Javase expression retains its type for any state a. 

Lemma 6 For types T, environment F, JavUse term t; 

F hse t : T V states a : F, a Vr t : T 

Notice, that the opposite direction does not hold. For example, for a variable 
diningFrPhils of type FrPhil[], the Java^ term diningFrPhils [3] :=aPhil 
is type correct, but the corresponding Javase term, diningFrPhils [3] :=aPhil 
is not. Furthermore, Java^ expressions may have more than one type. 

The type E-Thrn characterizes Java^ terms that contain actually thrown ex- 
ceptions. Thus, the type E-Thrn can only be encountered when typing Java^ terms. 

Lemma 7 For any Javar term t : F, a \~r t : E-Thrn 
3 context and reference t = t'cthrow tid, and cr(ti) =<SC... 



8 The Operational Semantics 

Figure describes the run-time model for the operational semantics. For a 
given program p, the operational semantics maps configurations to new con- 
figurations. Configurations are tuples of Java,, terms and states, or just states. 
The operational semantics is a mapping from programs and configurations to 
configurations. 

The state is flat; it consists of mappings from identifiers to primitive values 
or to references, and from references to objects or arrays. Note that references 
may point to objects, or arrays, but they may not point to other references, 
primitive values, or null- this is so, because pointers in Java are implicit, and 
there are no pointers to pointers. 
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An object is annotated by its class, and it consists of a sequence of labels and 
values. Each label also carries the class in which it was defined; this is needed for 
labels shadowing labels from superclasses, c.f. ch. 9.5. For the philosophers 
example, like Phil: t 2 , like FrPhil: null is an object of class 

FrPhil. It inherits the field like from Phil, and has the field like from FrPhil. 

The following state (Tq contains mappings according to the philosophers ex- 
ample: 

(To(aPhil) = ti 

(To(oyster) = ts 

(To(ti) = <Clike Phil: t2 > like FrPhil: nulli:g>’^^’’*'^^ 

Mi2) = 

0-o(t3) = 

Arrays carry their dimension and type information, and they consist of a 
sequence of values for the first dimension. For example, |3, 5, 8, is a one 
dimensional array of integers. 



Configuration 


:= {JavUr term, state) U (state) 




JavUr program — s- Configuration 




— > Configuration 




Configuration — > Configuration 


State 


:= (Ident — > Value )* U 




( RefValue — s- ObjectOrArray )* 


ObjectOrArray 


Object 1 Array 


Object 


:= <(LabelName ClassName : Value )*>ciassuame 


Array 





Fig. 16. Java,, run-time model 



8.1 State, Object Operations, Ground Terms 

In this section we define operations on objects, arrays and states. These opera- 
tions are well-defined, only if the object, array or state “conforms” to the types 
expected by the environment, a requirement introduced in definition 12. 

Definition 10 For object, oh j = <Cli Ci:vali,l2 C2 : val2,...,ln Cn:valn;^'^ , 
state a, value val, reference l±, identifier or reference z, class C, field identi- 
fier f , integers m, k with m > 0 , array arr = |valo, ...valn_i]^^^^ '^^“, we define: 

— the access to field f declared in class C as obj (f ,C): 

obj (f , C) = vali if f = li and C — 

— the access to component f , C of an object stored at reference l±, in state a : 

(r((.i,f,C) = cr(6i)(f,C) 



Describing the Semantics of Java and Proving Type Soundness 



69 



— the access to the component of arr, arr[k] : 

arr[k] = valjt i/0<k<n— 1 

— a new state, a' = (r[zi~^'Val], such that: 

(t'(z) = val 

(t'(z') = cr(z') for z' yf z : 

— a new object, obj' = obj [f , Ci— s-val], a new state, a' = f , Ci— >val] : 

obj'(f,C) = val 

obj'(f',C') = obj(f',C') iff^f'orC^C 

a' = , Ci-^val]] 

— a new array, arr' = arr[ki— >val], and a new state, a' = ke^-val] : 

arr'[k] = val 

arr'[j] = arr[j] */ j 7 ^ k 

a' = cr[tii-^-(T((,i)[ki-^val]] 

We distinguish ground terms which cannot be further rewritten, and 1-ground 
terms, which are “almost ground” and may not be further rewritten if they 
appear on the left hand side of an assignment: 

Definition 11 A Java^ term t is 

— ground iff t is a primitive value, or t=null , or t=L,i for some i; 

— 1-ground iff t=id for some identifier id, or t= l± . [C] f for a class C and a 
field f t= null. [C]f , or t = ti[k] or t = null [k] for some integer k. 



8.2 Program Execution 

Figures ^3^3 ^3 describe rewriting of Javarterms. We chose small 

step semantics because we found this more intuitive. Interestingly, it turns out 
that large step semantics allow for a simpler proof of subject reduction, and 
in particular, do not require different type rules for Javar assignment to array 
components and the other assignments statements^^l. On the other hand, this 
allows the description of co-routines ^3- figure^3^® describe the evaluation 
of variables, field and array access, and the creation of new objects or arrays. 

Figure^3^®®cribes statement execution. Statement sequences are evaluated 
from left to right. In conditional statements the condition is evaluated first; if it 
evaluates to true, then the first branch is executed, otherwise the second branch 
is executed. A return statement terminates execution. A statement returning 
an expression evaluates this expression until ground and replaces itself by this 
ground value - thus modeling methods returning values. 

Variables {i.e. identifiers, instance variable access or array access) are evalu- 
ated from left to right. The rules about assignment in figure^3prevent an expres- 
sion like X, or ti[C]v, appearing on the left hand side of an assignment from being 
rewritten further. They allow an expression of the form u[Cl] .w[C2] .x[C3] .y 
to be rewritten to an expression of the form tj [C3] . y for some j . Furthermore, 
there is no rule of the form {ij, a) . This is because there is no ex- 

plicit dereferencing operator in Java. Objects are passed as references, and they 
are dereferenced only implicitly, when their fields are accessed. 
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(id, cr)'^p{cr(id),(T) 


(4i.[C]f,cr)'^p(cr(4i,f,C),cr) 




(e,cr)-.^p(e',(j') 


(v[e],cr)'-»p(v'[e],cr') 

(v.[C]f,cr)'x^p{v'.[C]f,cr') 


(ti[e],cr)'^p(ti[e'],cr') 
(null[e], cr)~»p(null[e^], a') 


k is integer value 
(ti[k],cr)'^p{cr(4i)[k],cr) 


k is integer value 

(new NullPE^;:^, (j)'^p(ti, (T^) 


(null[k], crj-^p (throw ti,cr') 
(null.[C]f , (r)'^p(throw i±,cy') 


Li is new in a 


Li is new in a 

V — Vo — Vi... = Vn-l, n > 0 
a' = cr[4ii-^[vo, ...Vn-i]’’^'] 


a' — Cl : VI, ...fn Cn : 


{ new C<Cf 1 Cl Vi, ...fn Cn VnS>, a) 

'^p(ti,cr') 


(new T[n][v],cr)'^p(ti,(T') 


l<j<k, k>l, m>0 
ni > 0 IG — 1} 

{nj,cr)'^p(n',(T') 


m>l,n>0,A:>2 
Li new in a 

a' = cr[4ii-^[nullo, ...nulln-i|’^^'‘”’^'‘‘] 


(new T[ni]...[nj]...[nk]Oi...[]n,[v],cr) 
-^p{new T[ni]...[n']...H[]j...[]^[v],cr') 


(new T[n][] 2 ...[]Jv],cr)'^p(ti,cr') 


k > l,m > 0 
ni ground ie{l...k} 
nj < 0 for some j G {l...k} 
(new NegSzeE<C^, cr)'^p(ti, cr') 
(new T[ni]...[nk]Oi..D„,[v],cr) 


ni > 0, k > 2, m > 0, ao = cr 

T is a simple type 

(new T[n 2 ]...[nk][]^... 0 ^[v],ai) 

for all lG{0...ni — 1} 
tji is new in Uni lG{0...ni} 

['-ini ''j.i-if 


-^p(throw i±,o ) 


( new T[ni]...[nk][]j...0n,[v],cr)'^p(4j„^,cr') 



Fig. 17. Expression execution 



Array access as described here adheres to the rules in ch. 15.12 of Q, 
which require full evaluation of the expression to the left of the brackets. Thus, 
with our operational semantics, the term a[(a := b)[3]] corresponds to the term 
a[b[3]]; a := b. 

The last six rules in figure ^Jdescribe the creation of new objects or arrays, 
c.f. ch. 15.8-15.9 of Essentially, a new value of the appropriate array or 
class type is created, and its address is returned. The fields of the array, and the 
components of the object are assigned initial values (calculated at compile time, 
cf definition^ of the type to which they belong. 

For example, for a state ctoo the expression new int [2] [3] [] [] |0] would be 
executed as: ( new int[2][3] [] [] |0], (Too)'^p' ( t/, coi) where is, ie, and ij are new 
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(stmts, a)'^p{a') 


(stmts, (j)'^p(stmts', a') 


(stmts; stmt, (r)'^p(stmt, a') 


(stmts; stmt, (r)'^p(stmts'; stmt, a') 




(e,(j)'^p(e',(j') 


(if true then stmts else stmts', (t) 


(if e then stmts else stmts', (j) 


-^p(stmts, a) 


'^p(if e' then stmts else stmts', (t') 


(if false then stmts else stmts', (j) 
-^p(stmts', (j) 




(return 


(e,(T)'^p(e',(r') 


val is ground 


(return e, (j)'^p(return e', a') 


(return val, (j)'^p(val, a) 



Fig. 18. Statement Execution 



in (Too, and they have the following contents in ctoi: 

o"oi(t 5 ) = [null, null, 

o"oi(t6) = [null, null, null]“^^'^'^' 

O"oi(t7) = |t5 , tel 

Figure^3describes the evaluation of assignments. According to the first rule, 
the left hand side is evaluated first, until it becomes 1-ground. Then, according to 
the next rule, the right hand side of the assignment is evaluated, up to the point 
of obtaining a ground term. Assignment to variables or to object components 
modifies the state accordingly. 

The last three rules describe assignment to array components where the index 
being within bounds has to be checked first (if not, IndOutBndE is thrown), then 
the value has to fit the array (if not, ArrStoreE is thrown), and, if the two 
above requirements are satisfied, then the assignment is performed. Fitting, a 
requirement which ensures that an object or array value is of a type that can be 
appropriately stored into another array, is described in the definition 11. 

Other exceptions {e.g. null access) need not be considered in these rules, 
because they would be checked by the variable rules (hgure^J, and then prop- 
agated by the exception rules from figure^J Also, we have no rule of the form 
((.j := value, (r)'^p.... This is because in Java overwriting of objects is not possi- 
ble - only sending messages to them, or overwriting selected instance variables. 

Definition 12 A value val fits a type T = T'[] in a program p, iffval is prim- 
itive, or val=null , or (r(val) = and p h C C T', or (r(val) = |...] 

and p h T" C Tb 

Note that a primitive value fits any array type, e.g. 4 fits the type FrPhil [] [] [] . 
This is so, because when primitive values are assigned to array components no 
run time check needs to be performed, c.f. lemma 11. Also, note that in the above 
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V is not 1-gronnd 

(v,cr)'^p(v',cr') 


V is 1-gronnd 
(e,cr)'^p(e',cr') 


(v:=e, cr)'^p(v':=e, cr') 


(v:=e, cr)'^p(v:=e', cr') 


val is ground 
id is an identifier 
(id:=val, cr)'^p(cr[id^-^val]) 
(ti.[C]v:=val, cr) 


val, k are gronnd 

cr(ti) = |[valo...valn-i]’'^'‘’”^'" 

0>k, ork>n— 1 

(new IndDutBndE"C3>, cr)'^p(tj , cr') 


'^p(cr[ti, V, Cl— >val]) 


(ii[k]:=val, (j)'^p(throw tj,cr') 


val, k are gronnd 

cr(ti) = [valo...valn-i]’'^b---nm 

0 < k < n- 1 

val does not fit T[]j...[]jj_ in p, cr 
(new ArrStoreE<d?>, cr)'^p(ij , cr') 


val, k are gronnd 

cr(ti) = |[valo...valn-i]’'^'‘’”^'" 

0 < k < n- 1 

val fits T[]^...[]^ in p, cr 


(ti[k]:=val, cr)'^p(throw tj,cr') 


(ti[k]:=val, cr)'^p(cr[ii, ki-^val]) 



Fig. 19. assignment execution 



definition the types T' and l" may be array types themselves, and remember that 
the subclass relationship is monotonic with the array type constructor (i.e. p h 
C C C' implies that p b C[] IZ C'[]). 



vali is ground for ie{l...k — l},n>k> 1 

{ei,,o-)'^p{e^,o-') 

{vali.[AT]m(val 2 , valu_i, ei,, ...en), cr)'^p 

(vali.[AT]m(val 2 , valu-i, e^, ...en), cr') 

vali is ground for ie{2...n},n> 1 

(null .[AT]m(val 2 , ...vain), cr)'^p{throw new NullPE<d?>, cr) 
n > 1 

vali is ground for i€{l...n} 
cr(vali) = 

AT = l2 X ... X Tn 

MethBody{m, AT, C,Tp) = (C^,m is Ax 2 : T 2 ...Axn : Tn. {stmts}) 
Zi are new identifiers in cr 
a' = cr[zii-^vali]...[zni-^valn] 
stmts^ = stmts[zi/this, Z 2 /X 2 , ...Zn/xn] 

(vali.[AT]m(val 2 , ...vain), cr)'^p(stmts', cr') 



Fig. 20. Evaluation of method call 
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Figure describes the evaluation of method calls. The receiver and ar- 
gument expressions are evaluated left to right, c.f. ch. 9.3 in The first 

rule describes rewriting the expression, where all the previous expressions 
(i.e. vali, i e {l...k— 1}) are ground. The second rule requires the exception 
NullPE to be thrown if the receiver is null. The third rule describes dynamic 
method look up, taking into account the argument types, and the statically 
calculated method descriptor AT. The term t[t'/x] has the usual meaning of 
replacing the variable x by the term t' in the term t. 

Execution of the method call aPhil. [Phil] think (aPhil) results in the 
following rewrites: 

(aPhil. [Phil]think(aPhil), (To) '^p' (ti. [Phil]think( aPhil), (Tq) '^p' 
((•i.[Phil]think((.i), (To) '^p/((w.[FrPhil]like:=oyster; ...),(Ti) -^p/ 

((■■■), ^ 2 ) 

where ai, a 2 are: 

(Ti(aPhil) = (To(aPhil) = 

(Ti(oyster) = (To(oyster) = 63 
(Ti(w) = ti 

(Tl(w') = (-1 

= (To( 6 i) = <Clike Phil: L2, like FrPhil: nuli;:|>’^’^’’*'^^ 
(Ti((. 2 ) = 0-0(12) = 

o-i((. 3 ) = 0-0(63) = 

o- 2 (z) =o-i(z) Vz 7 ^ ti 

172(61) = <Clike Phil: 62, like FrPhil: 

The rules in figure describe the operational semantics for propagation 
and handling exceptions. Thus, (try throw new E; f(x)... catch Ei vi stmtsi, a) 
would rewrite to (try throw 6i catch Ei vi stmtsi, a) then to (stmts^, a), if E 
is a subclass of Ei. During execution the term maintains its type, which is void; 
the subterm throw 6i has the type E-Thrn. 

9 Soundness of the JavUg Type System 

9.1 Conforming Environments and States 

We require objects to be constructed according to their class, array values to 
conform to their dimension and to consist of values of appropriate types, and 
variables to contain values of the appropriate type. Furthermore, an environ- 
ment that contains all definitions from another environment, plus possibly some 
additional variable definitions is said to conform to the second environment. 

Definition 13 A value val weakly conforms to a type T in an environment F 
and a state a iff: 

— val is a primitive value, T is a primitive type, and valGT, or 

— val^null, and T is a class, interface or array type, or 

— val=6j, cr(6j) = and T h C <wdn T, or 
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(e, a) -^p (e'.tr') 
contIZ • □ a context 

(contIZ throw eZ,cr) (contIZ throw G'Z,cr') 

(contZthrow tiZ,cr) (throwtijCr) 



(throw null , <7) 

(throw new NullPE<^;^, cr) 



(stmts, < 7 )'^^{< 7 ') 

(try stmts catch Ei Vi stmtsi . .. catch Eq Vq stmtSn , cr)'^p ( ct' ) 

(try stmts catch Ei Vi stmtSi ... catch Eq Vq stmtSn finally stmtSn+i , tr) 
'^p(stmtSn-f 1 , <7') 

(stmts, (t)'^p ( stmts' , <7') 

(try stmts catch Ei Vi stmtSi . .. catch Eq Vq stmtSn,( 7 ) 

^^p(try stmts' catch Ei Vi stmtSi ... catch Eq Vq stmtSn,( 7 ') 

(try stmts catch Ei Vi stmtSi ... catch Eq Vq stmtSn finally stmtSn4.i , cr) 

'^p(try stmts' catch Ei Vi stmtSi ... catch Eq Vn stmtSn finally stmtSn-f-i , cr') 

<r(ti) = 

Vke{l...n} NOT p h E C Et 

(try throw t, catch E, v, stmts, catch En Vn stmtSn , cr)'^p(throw 

(try throw catch Ei Vi stmtSi ... catch Eq Vq stmtSn finally stmtSn+i,< 7 ) 

'^p(stmtSn4-i ;throw ti,cr) 

<y{H) = < ... 

3 ie{l...n} : p h E C E, AND Vke{l...i-l} NOT p h E C Et 

stmts' = stmtSi [z/vi], z new in stmts and in <7 
<7' — cr[zi— »-ti] 

(try throw ti catch Ei Vi stmtSi TT! catch Eq Vq stmtSn , cr)'^p(stmts' , ct') 

(try throw ti catch Ei Vi stmtSi ... catch Eq Vq stmtSn finally stmtSn-f 1 , cr) 

'^p(try stmts' finally stmtSn-|.i , cr') 



Fig. 21. exception throwing, propagation and handling 



— val=(.j, (r(rj) = and F h <wdn T. 

A value val conforms to a type T in an environment F and a state a iff val 
weakly conforms to 1 in F and a and 

— val=tj, cr(tj) = <Cvi Cl : vali, ...Vn Cn : valn;^*^, and V labels v, classes 
C' , types T' with (C',T') G FDecs(F,C,v), 3ks{l...n} with vj, = v, Ck = C', 

and valk weakly conforms to T' in F and a; or 

t' n n 

— val=tj, cr(tj) = |valo, ...vain] ond Vi G {0...n} : vali weakly con- 

forms to T'[] 2 ...Qfc. 

Furthermore, a state a conforms to an environment F iff for all identifiers x, 
and integers i 

— if F{x) yf Undef then (j(x) conforms to F(x) in F,a; 

— if , then conforms to C in F, a; 

— if (j{iC) = then ti conforms to T[]j...[|jj in F, a. 

Finally, an environment F conforms to environment F' iff for any identifier x: 

— F'(x) yf Undef implies F{x) = F'(x); 

— F'(x) = Undef yf F(x), implies that F(x) is a variable. 
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For example, the state (Tq from section ^ conforms to the environment Fq. The 
“fitting” requirement from definition 11 is weaker than conforming. Also, con- 
forming is defined in terms of an environment, whereas fitting is defined in terms 
of the more restricted information that is available in the program. 

The following lemma states that conforming environments preserve all prop- 
erties. 

Lemma 8 Given environments F , F' , where F conforms to F' , any term t, 
types T, t' program p, and argument types AT = T 2 x ... x Tn.' 

— T h O ^ T'hO; 

— T' h p O ^ T h p O; 

-aft <^dn T' r h T <^dn T'; 

— T' h t : T T h t : T; 

— FirstFit{F, m, T', AT) = FirstFit(F' , m, T', AT); 

— F'het : 7 =» Fhet : T; 

— T', a \~r t : T => T, cr 1> T : T. 

9.2 Properties of Term Evaluation 

The operational semantics is deterministic up to renaming of addresses and 
identifiers. A term containing an actually thrown exception not included by a 
try statement, i.e. one with the type E-Thrn, will either not terminate, or it 
will terminate in a throw statement. Rewriting variables on the left hand side of 
assignments does not make their type more special, except for arrays. Program 
execution may modify the contents of arrays and objects, but will not change 
their type or class: 

Lemma 9 For a state a conforming to a well-formed environment F, a Javoge 
program with T h p O, a well-typed Javage term t: 

— (t, (r)'^p(t', <t') and {t,a a”) implies that t' = t", a' = a" up to 

renaming of addresses and identifiers. Also, {t, a)'^p{a') and 

implies that a' = a" up to renaming of addresses and identifiers. Further- 
more, it is impossible to have (t, (r)'^p(t", cr") and {t, a)'^p{a') . 

— If F, a \~r t : E-Thrn, then, either (t,(r) '^p* does not terminate, or 
(t,a) '^p* (throw L±,a) 

— For Javar variables v, v', if (v, (r)'^p(v', cr'), and T, cr 1> v : T, and v is 
not l-ground, then F, cr 1> v' : T' , T h T' <wdn T and v' is not ground. 
Furthermore, if v is not an array access, then T = T' . 

— If (t, (r)'^p(t', <t'), then for any ti, if a (ii) = then cr'(ti) = 

I ]T[]i...[]„^ and if a{L±) = <^...^^ then a'{L±) = «C...»^. 

Don Syme pointed out to us that a lemma stating that program ex- 

ecution preserves types up to widening, is necessary for the proof of subject 
reduction. Interestingly, it turned out that a stronger lemma, than that origi- 
nally suggested and used in the subject reduction theorem, is possible, namely. 
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execution of a program does not have any effect the type of an expression. This 
lemma is easier to prove, and considerably facilitates the proof of subject reduc- 
tion. 

Lemma 10 For Java^ terms t, t', t” , states a, a' , environments F , F' , type 
T", Javas program p and Javar program p' = C{F,p}, if 

— T h p O and T, cr 1> t : T and F, a 1> t" : T"; 

— a conforms to F and F' conforms to F 

— (t,cr)'^p/(t',(7'), or (t,cr)'^p/((T') 

then 

— F',a' h t" : T". 

The lemma may be surprising: As stated later in the subject reduction theorem, a 
term t when rewritten to a new term t' has, possibly, a narrower type; therefore, 
one would expect evaluation of the term t to affect the type of a third term 
t". However, according to the above lemma, even if t" should contain t as a 
subterm, its type does not change. The lemma is proven by structural induction 
over term execution (he. on (t, (r)'^p' (t', ct'), or (t, (r)'^p' (cr')), and then, each 
case by structural induction on the typing of t" (he. on F,a h> t" : T"). The 
interesting cases are those where the state changes, i.e. the application of the 
three different assignment rules from figure ^3 Assignments do not change the 
types of variables (these are looked up in the environment). They do not change 
the type of addresses (as shown in lemmaH. They do not change the type of 
array access because this depends on the type of the array and not on the type 
of the actual array component. And they do not change the type of object access 
because this too depends on the type of the object and the class stored in the 
descriptor and not on the value stored in the object field. 

The array property, introduced in the following definition, ensures that check- 
ing for fitting when executing array assignments will be sufficient to preserve 
conformance of the state. 

Definition 14 A Javar term t has the array property for a program p and for 
a state a, iff for any subterm oft with the form v[e] := e', with F, a 1> v[e] : T 
and F, a \~r e' : T', if NOT T h T' <wdn T, then for appropriate n > 0, 

T = C[],...D„, T' = C'[],...D„, and NOT p h C' C C. 

The array property is trivially guaranteed in type correct Javase terms, and 
thus in any Java^ terms that are the result of enriching type-correct Javag terms, 
and it is preserved by the execution of Java,, terms. 

Lemma 11 For an environment F under which the Javas term t and Javase 
term t' are well typed, Javase program p with T h p O, p'= C{T,p}: 

— t' has the array property for p and any state a. 

— C{r,t} has the array property for p and any state a. 

— If a conforms to F, t' has the array property for p' , a, and {t' ,a)'^y{t" ,a'), 
then t" has the the array property for p' and a' . 

— V Javar terms t", states a: t" has the array property for p and a => 

t" has the array property for p' and a 
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9.3 Subject Reduction and Soundness 

The subject reduction theorem says that any non-ground well-typed Java^e term 
either rewrites to another well-typed term of a type that can be widened to 
the type of the original term, or it rewrites to an exception. Furthermore, the 
state remains consistent with the environment. The subject reduction theorem 
of this paper is stronger than usual subject reduction theorems: not only does 
it guarantee that rewriting preserves types, but it also guarantees that a rewrite 
step exists for any well- formed, non-ground term. (In that sense it combines the 
safety and soundness property described in chapter 4 of this book.) In particular, 
it guarantees for statically type-correct expressions, that the situation where an 
object cannot execute a message (the Smalltalk counterpart to “object does 
not understand message”) will never occur. On the other hand, it does not 
preclude the usual run-time errors like index out of bound, or wrong assignment 
to array components; however, it does guarantee that such erroneous situations 
will raise an exception, as opposed to going unnoticed and corrupting the run- 
time environment. 

Theorem 1 Subject Reduction For a state a that conforms to an environ- 
ment r , a Javase program p with F h,e p a non-ground Javar term t with 
the array property for p and a, and a type T with F,a\~rt : T, there exist a' , 
F' , t' , T' such that: 

— (t, (j)'^p(t', cr'), and F',a' 1> t' : T', and t' has the array property for p 
and o' , and: 

• T'=E-Thrn, E an exception, a' conforms to F , F' = F 
or 

• TFT' <wdn T, F' conforms to F, o' conforms to F' 
or 

— (t, <t)'^p((t') and a' conforms to F 

Furthermore, if t is a non l-ground variable, then (t, (r)'^p(t', cr') and t' is not 
ground. Also, if t is a non l-ground variable which isn’t an array access, then 
T = T'. 

The theorem is proven by structural induction over the derivation of T, ct t : T. 

When the method call aPhil.[Phil]think(aPhil) was evaluated in the 
philosophers example, then after the third rewrite step, the “environment ex- 
tension” required by the subject reduction theorem is F' = Io,w:FrPhil, 
w' : FrPhil. The states cti, (T 2 conform to F' . 

Finally, the soundness theorem states that execution of a well-typed Java^ pro- 
gram will produce a uniquely defined value of the expected type in a state con- 
forming to the definitions, or it will throw an exception which will be propagated 
to the outermost level, or it will not terminate. 

Theorem 2 Soundness Take any Javas term t, a well-formed environment 
F , a type T with F \- t : T, a Javag program p with T h p and a state a 
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that conforms to F. Then for the JavOse program p', p' = C{F,p}, there exists 
a unique JauUr term t', and a state a' , such that: 

— T 7 ^ void, (C{r, t}, (r)'^p'*(t', ct'), t' is ground, 

3T' : r, a' hr t' : F , F h T' <wdn T and a' conforms to F or 

— T = void, and (C{F, t}, (r)'^p'*((T^) and a' conforms to F or 

— (C{F, t}, (t)'^p'* does not terminate or 

— (C{F, t}, (t)'^p'* ( throw (.i, a') , and <j{bx) = and h E C Exception 

10 Conclusions 

We have given a formal description of the operational semantics and type sys- 
tem for a substantial subset of Java. We believe this subset is reasonably rich 
and contains many of the features which together might have led to difficul- 
ties in the Java type system. By applying some simplifications we obtained a 
straightforward system, which, we think, does not diminish the application of 
our results. 

Close scrutiny of the language description showed that the semantic is- 
sues related to the scope of our investigation are unambiguously answered by 
However, we found areas that could have been defined more generally {e.g. the 
return types of methods override those from superclasses and superinterfaces) 
and others that could have been defined more concisely {e.g. the descriptions 
of widening and of exceptions). Furthermore, in we describe problems 

related to the definition of binary compatibility, and attempt a formalization of 
this concept. 

We believe that the formal system we have developed is very near to Java and 
to programmers’ intuitive ideas about program execution. On the other hand, 
we now have a large system, and the proofs of the lemmas require the consid- 
eration of many cases. The system grew and evolved through many iterations, 
and during which some omissions crept into the argumentation. The most sig- 
nificant omissions were uncovered by Don Syme and are described earlier on in 
this paper, and also, in the next chapter of this book^J. With the modifica- 
tions he suggested, he was able to validate the subject reduction theorem using 
his theorem checker. This gives us greater confidence in our results, but it also 
underlines the importance of the use of theorem checkers for such, rather large 
systems. 

Another proof of the soundness of the Java type system, using Isabelle in a 
large step semantics is described in Applications of theorem provers for 

programming language properties are also described in 

We aim to extend the language subset to describe a larger part of Java, and 
we also hope that our approach may serve as the basis for other studies on 
the language and its possible extensions looking at further 

language properties such as an abstraction property and binary compatibility 
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1 Introduction 

This chapter describes a machine checked proof of the type soundness of a sub- 
set of Java (we call this subset Javag). In Chapter 3, a formal semantics for 
approximately the same subset was presented by Drossopoulou and Eisenbach. 
The work presented here serves two roles: it complements the written seman- 
tics by correcting and clarifying some details; and it demonstrates the utility of 
formal, machine checking when exploring a large and detailed proof based on 
operational semantics. ^ 

This work contributes to three distinct fields of formal reasoning: 

— The Formal Study of Java: We contribute a detailed analysis of a significant 
property of Java, and provide corrections to proofs that are interesting in 
their own right. 

— Tools for Formal Methods: This work is a major case study in so-called 
‘declarative’ proof techniques. The tool we use, called DECLARE [Sym97], 
has been developed by the author to demonstrate the utility of these tech- 
niques. 

— Formally Checked Properties of Languages: This work contributes a tool 
and a methodology for the general task of machine checking properties of 
languages. 

Most of this chapter should be clear to readers with a basic understanding of 
operational semantics, formal specification and the results presented in Chap- 
ter 3. 

Our main aim has not been to find errors. However, a significant error in 
the original formulation adopted by Drossopoulou and Eisenbach [DE97] was 
discovered during our work. We also independently rediscovered a significant 
error in the Java Language Specification [GJS96]. Both errors are described in 
Section 6. 

^ The latest version of the proofs and specifications described in this document are 
available on the World Wide Web at 

http://www.cl.cam.ac.uk/users/drsl004/java-proofs.html. This will be updated to 
reflect further work on the formalization. 
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The first error resulted in an interesting collaboration between Drossopoulou, 
Eisenbach and the present author, and led to a deeper understanding of the prob- 
lems involved. It is as a result of this exercise that we discuss methodology in this 
chapter, because the methodology we adopted enabled us to find errors quickly 
and to provide good feedback to the authors of Chapter 3. This demonstrates 
the positive role that machine checking can play when used in conjunction with 
existing techniques. 



1.1 Outline 

This chapter is organized as follows. The rest of this introduction describes, in 
general terms, just what we have proved, and how we have gone about doing 
it. Section 2 delves into the technical content of our model of Javas, and puts 
into place the building blocks necessary for the proof. As our semantics is based 
heavily on that of Chapter 3, we only give details where our analysis departs 
from theirs. 

In Sections 4 and 5 we describe the process of machine checking this proof 
in detail, taking us from a higher-order logic formalization of the problem to a 
completed proof script. In Section 6 the errors we have mentioned are described, 
and we summarize and discuss related work in Section 7. 



1.2 What Have We Proved? 

An introduction to the notion of type soundness has already been given at the 
beginning of Part 2. Briefly, type soundness states that a well- typed Java pro- 
gram will not ‘go wrong’ at runtime, in the sense that it will never reach a state 
that violates conditions implied by the typing rules. To illustrate, one aspect of 
type soundness is captured in the following statement that is taken directly from 
the Java Language Specification [GJS96]: 

The type [of a variable or expression] limits the possible values that the 
variable can hold or the expression can produce at runtime. If a runtime 
value is a reference that is not null, it refers to an object or array that 
has a class . . . that will necessarily be compatible with the compile-time 
type. 

In this study we are concerned with the Java language itself, rather than the 
Java Virtual Machine (JVM). The two are closely related but the difference is 
non-trivial: for example there are JVM bytecodes that do not correspond to any 
Java text. Thus it remains a challenge to formalize and verify the corresponding 
type soundness property for the JVM. However, unlike many high-level/low-level 
language combinations (e.g. C-|— |-/assembler) the type systems of Java and the 
JVM are closely related, and a comprehensive study of the former is a useful 
precursor to the study of the latter (see also [Qia97]). Of course, even if an 
abstract model of Java and/or the JVM is verified, this does not guarantee the 
soundness of a particular implementation. 
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The precise formulation of type soundness we use is described in Section 3, 
but, not surprisingly, it must be expressed in terms of the inner workings of a 
runtime machine, in our case the execution model we use for Javag. This helps 
explain why it takes so much infrastructure before we can even state type sound- 
ness explicitly. Ultimately we would like to verify various “security properties” 
that are independent of the inner workings of the particular runtime model, but 
it is beyond the scope of this work to demonstrate such properties. 

The Java subset we consider here is that covered in version 2.01 of Drossopou- 
lou and Eisenbach’s paper. ^ It includes primitive types, classes with inheritance, 
instance variables and instance methods, interfaces, shadowing of instance vari- 
ables, dynamic method binding, statically resolvable overloading of methods, 
object creation, null pointers, arrays and a minimal treatment of exceptions. An 
advantage of the approach to formalization we take in this work is that as new 
features of the language are treated it will be possible to incrementally adjust 
existing definitions and proofs. 



1.3 Five Steps to a Formalized, Machine Checked, Human Readable 
Proof of Java Type Soundness 

In this chapter we are largely concerned with how we prove type soundness, 
to the point that a machine can check our proof. Here we step back to look at 
the methodology in general, to understand what we learn at each stage of the 
process. The end result of the methodology is a proof outline that is machine 
checkable, human readable and maintainable as further features are added to our 
language. A feature of the methodology is that valuable feedback is provided to 
language researchers at each step. 

The steps of the methodology are as follows: 

1. Understand the Problem 

This first step is so obvious it should hardly need stating: we must develop 
a strong understanding of the problem before we proceed. Like all theorem 
provers, the tool we use, called DECLARE [Sym97], should only be used 
when this has been achieved.^ 

2. Develop a Machine Acceptable Model 

This involves developing a machine acceptable model of the system, in our 
case as a DECLARE specification. This process typically uncovers many sig- 
nificant errors and omissions in the original specification, and complications 
arise, e.g. those that arise from the use of a formal logic rather than informal 
mathematical notation. 

^ This version was distributed only on the WWW, and is no longer directly available. 

If a version is needed for reference please contact the authors. 

® We state this explicitly because some previous attempts at machine-checked language 
formalization have held that machine formalization should (somehow) be used to 
reveal the underlying theory (this can be seen by the fact that the theory was not 
worked out in significant detail prior to using the machine). The two can be done 
concomitantly but one should not be pursued at the expense of the other. 
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3. Validate the Model by Generating an Interpreter and Running 
Test Cases 

If specifications were always perfect, then systems probably would be as 
well, and there would be little need for formal methods. However, specifi- 
cations nearly always contain mistakes, and thus some process of validation 
is required. Thus, we must attempt to check that the logical specifications 
represent a valid model of the Javas language. This validation is of course 
non-trivial, and the tools required to perform validation (notably the ability 
to execute specifications when they fall in an executable subset) are rarely 
provided by the theorem proving community. Researchers will often rely on 
the process of proof to debug their specifications, a tedious exercise that is 
not particularly effective. 

We have used two main techniques for validation: typechecking (which is 
easy as DECLARE is based on higher order logic), and the automatic gener- 
ation of ML code for an interpreter, directly from the specification. It is not 
possible to remove all mistakes in the specification via these techniques, but 
are surprising number are caught. 

4. Formulate All Key Properties 

We should now have a valid model of the Javag language, in a form that the 
computer can accept. We now write the properties that we expect to hold of 
the specification. Though this may seem simple, it invariably isn’t: formu- 
lating properties can take as much work as formulating a model, especially 
for properties of programming languages. Because writing in a formal logic 
requires attention to detail, this process can uncover many mistakes. 

5. Sketch Outlines of the Proofs and Refine the Proofs Until the 
Proof Checker is Convinced 

We now turn to the formal proofs of the propositions we have developed. 
This involves writing ‘rough’ proof outlines in a format close to that accepted 
by DECLARE, and repeatedly refining these proofs until they are accepted 
as correct by DECLARE’s automated proof checker. DECLARE supports the 
expression of proofs in a simple case-decomposition language that resembles 
the style used by mathematicians. Most importantly, it supports a migratory 
path from a rough outline to a machine acceptable outline. 



The methodology is like the ‘waterfall’ methodology of software development: 
each step can require a return to previous steps, and we iterate until the task 
is complete. Some steps (e.g. validation) can be highly automated or skipped 
in later iterations. The methodology differs substantially from that applied to 
many previous theorem proving projects: it is top-down, especially when we write 
proofs. The advantages of such an approach are well understood from software 
engineering, and our tool, DECLARE has been developed especially with the aim 
of supporting it. 
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Surprisingly, the process of writing rough proof sketches was the most valu- 
able stage in the work. It was here that the flaw in the original proof was discov- 
ered (see Section 6). An important by-product of this stage is identifying the key 
lemmas about component constructs that support the argument. Our methodol- 
ogy supports this elegantly: unless you are formalizing a well-established corpus 
of mathematics, the necessary lemmas are not at all obvious a priori, even if 
the general direction is clear. Thus support for top-down proof development is 
essential. 

2 Our Model of Java^ 

With issues of methodology out of the way, we move on to our proof of Java type 
soundness. However, before we get to the proof itself, we present the details of 
our model of Javag. We will inherit much from Chapter 3, so we concentrate on 
the areas where our model differs. The material in this section is quite technical 
and there are many “building-blocks” to consider: the reader is encouraged to 
refer back as needed. 

The aim of the type correctness proof is to bridge the gap between: 

— A model of the static checks performed on Javag programs; and 

— A model of the runtime execution of Javag programs. 

This section is devoted to describing these two components, which we will con- 
nect in Section 3. A picture of the components of the semantics is shown in 
Figure 1. The “annotated” language JavaA is the result of the static checking 
process and the “machine-code” language Java/j is the code executed at runtime. 
Our semantics were originally based on that developed by Drossopoulou and 
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Fig. 1. Components of the Semantics and their Relationships 



Eisenbach in version 2.01 of their paper [DE97]. The main differences between 
our semantics and this version are outlined below. Some of these suggestions 
have been incorporated into the version presented in Chapter 3. 

— We correct minor mistakes, such as missing rules for null pointers, some 
definitions that were not well-founded (e.g. those for MSigs, FDecs and FDec), 
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some typing mistakes and some misleading/ ambiguous definitions (e.g. the 
definition of MethBody, and the incorrect assertion that any primitive type 
widens to the null type). 

— We choose different representations for environments, based on tables (par- 
tial functions) rather than lists of declarations. 

— We differentiate between the source language Javag, the annotated language 
Javayiand the ‘runtime terms‘ Java^. The latter are used to model execution 
and enjoy subtly different typing rules. 

— We adopt a suggestion by von Oheimb (see Chapter 5) that well-formedness 
for environments be specified without reference to a declaration order. 

— We allow the primitive class Object to have an arbitrary set of methods (In 
Chapter 3 Object has no methods). It was when considering this extension 
that one mistake in the Java Language Specification was discovered (see 
Section 6). 

— We do not use substitution during typing, as it turns out to be unnecessary 
given our representation of environments. 

— At runtime we do not choose arbitrary new names for local variables when 
calling a procedure, but use a system of ‘frames’ of local variables that makes 
reasoning about substitution easier (and is also closer to a real implementa- 
tion based on stacks and offsets). 

— The modeling of multi-dimensional arrays in version 2.01 of Drossopoulou 
and Eisenbach’s paper was not faithful to the Java Language Specification, 
where sub-array dimensions are not constant. 

— Arrays in Java support the methods supported by the class Object (e.g. 
hashValue 0 ). We include this in our model (with non-trivial consequences). 
However our model of arrays is still incomplete, as Java arrays support cer- 
tain array-specific methods and fields, whereas in our treatment they do 
not. 

Figure 2 presents the abstract syntax of Javag programs. 



2.1 The Static Semantics: Environments, Widening, and Visibility 

Chapter 3 has already covered the basic components of the static semantics for 
Javag. The complicating factors for the static semantics are: 

— Java allows the use of classes before they are defined. A non-circular class 
and interface hierarchy must result. Thus type-checking environments were 
defined, extracted from all the classes and interfaces that make up a program. 
A well-formedness condition is required for these. 

— Java allow subtyping in a typical object-oriented fashion, which leads to the 
widening (<wdn) relation. 

— Defining well-formedness for type-checking environments requires knowledge 
of what identifiers are visible from subclasses. Visibility is defined by relations 
for traversing the class and interface hierarchies. 
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Fig. 2. The Abstract Syntax of Javas 
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— Java implementations disambiguate field and method references at compile- 
time. Method calls may be statically overloaded (not to be confused with 
the object oriented late-binding mechanism), and fields may be hidden by 
superclasses. 

Type checking environments contain several components (Figure 4). Always 
present are tables of class and interface declarations, and when typechecking 
inside method bodies we add a table of variable declarations. We write environ- 
ments as records (( . . . )) , and omit record tag names when it is obvious which 
record field is being referred to.^ In the machine acceptable model tables are 
represented as partial functions, and sets as predicates: 

tciblG /-) /^ j • 

a I — > p = a — > p option 
set of a = a — > bool 

where the type function option has the standard definition. 



class-env = 

interface-env = 
variable-env = 
class-dec — 



interface-dec = 



j table j j 

class-names \ — ^ class-dec 

, p table ■ , p ; 

interjace-names \ — ^ interjace-dec 

. , , table 

variable-names i — > type 
{ super: class-name, 

interfaces: set of interface-names , 
fields: field-names^e^ type, 

methods: method-names x arg-typese^^ expr-type) 
{ superinterfaces: set of interface-names , 

methods: method-names x arg-typese^^ type) 



Fig. 4. Type checking environments 



We use r for a composite environment, , F^ and F^ its respective compo- 
nents, and F{x) for the lookup of x in the appropriate table. We also use x G F 
to indicate that x is defined in the relevant table in F. 

Component types, array types, reference types and regular types are said to 
he well-formed, written F \- object O syntax -category (e.g. .. |- .. wf_classin 
the DECLARE specification) if all classes and interfaces are in scope. 

Next we define the subclass (EcZass = subclass_of), subinterface (Tj„j = 
subinterf ace_of) and implements {Ump = implements) relations as shown be- 
low. All classes are a subclass of the special class Object, though we do not have 
to mention this explicitly as the well-formedness conditions for environments will 
ensure it. 

^ In the machine acceptable model we do not use such conveniences: records are rep- 
resented as tuples. 
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In the DECLARE specification, the ‘^class-name and Qciass relations are 
defined by the DECLARE text 



inductive_relation wf_class 
(Object) [rw,prolog] 



TE |- "Object" wf_class 

(Decl) [rw,prolog] 

Cdec(TE,C) = SOME(classdec) 



TE I - C wf_class 

inductive_relation subclass_of 
(Refl) [rw,prolog] 



TE I - C subclass_of C 
(Step) [prolog] 

Cdec(TE,C) = SOME (CLASS (C> ,_,_,_)) & 
TE |- C’ subclass_of C’’ 



TE I - C subclass_of C’ ’ 

Here TE is the type environment, and contains a partial function from class- 
names to class declarations. Extra syntactic detail is required such as SOME to 
indicate that the class is actually in the domain of CE. 

Keywords such as rw and prolog are “pragmas” that provide interpretative 
information to proof tools when the specification is used as a set of logical axioms: 
in particular rw indicates that the rule can be safely used as a (conditional) 
rewrite, and prolog that the rule can be safely used as a backchaining Prolog- 
style rule. 

Subtyping in Java is the combination of the subclass, subinterface and im- 
plements relations, and is called widening. Defining widening accurately in the 
machine acceptable model turns out to be a tedious but instructive process: we 
define it incrementally over the different kinds of types, i.e. over simple reference 
types (<sref) then component types (<comp) then array types (<arr) and so on 
through to regular types (<wdn)- We have to be careful about this to avoid errors 
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that creep in by other approaches: e.g. in the original formulation it appeared 
that all primitive types are subtypes of Object, when in fact only reference types 
are. 

The full rules for widening are given in Appendix B. Note the co- variant rule 
for arrays eventually leads to the need for runtime typechecking. 

The functions FDec, FDecs and MSigs tell us what fields are visible from a 
given class or interface. They traverse the subclass/subinterface graphs, starting 
at a particular class/interface. 

— FDec: Finds the ‘first visible’ definition of a field starting at a particular 
class. A set is returned, with at most one element when the environment is 
well-formed. 

— MSigs: Finds all the methods visible from a reference type. Methods with 
identical argument descriptors hide methods further up the hierarchy, though 
return types may be different. 

— FDecs: Finds all fields, including hidden ones. This is used to determine the 
runtime fields of an object. 

In Drossopoulou and Eisenbach’s original formulation these definitions were 
given as recursive functions, and only make sense for well-formed environments, 
as the search may not terminate for circular class and interface hierarchies. Un- 
fortunately the constructs are themselves used in the definition of well-formedness 
below. To avoid this problem we define the constructs as inductively defined sets. 
Our definitions are given in Appendix C. 

MSigs is defined by first defining MSigsc MSigs/ and MSigs^i for the visible 
methods from the three different reference types. The methods visible from ar- 
rays and interfaces include all methods found in the type Object. Whether this 
should be the case for interfaces is the subject of discussion in Section 6.1. 

2.2 The Static Semantics: Well-Formedness, Type Checking, and 
Java^ 

Well-formedness for type checking environments is essential (F F Otyenv = 
wf _tyenv) to ensure that subclasses provide methods compatible with their su- 
perclasses. Drossopoulou and Eisenbach originally formulated this by an incre- 
mental process, where the environment is constructed from a sequence of defini- 
tions. We originally followed this formulation, but von Oheimb (See Chapter 5) 
has pointed out that this is not necessary, since the definition is independent of 
any ordering constraints (however a finiteness constraint is needed to ensure no 
infinite chains of classes not terminating in ‘Object’ exist). 

In the machine acceptable model, every class declaration in a well-formed 
environment must satisfy: 

— Its superclass and implemented interfaces must be defined and no circulari- 
ties can occur in the hierarchy; 

— Any methods that override inherited methods (by having the same name 
and argument types) must have a narrower return type; 
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— All interfaces must be implemented by methods that have narrower return 
types. 

In addition the class Object must be defined and have no superclass, superin- 
terfaces or fields. In the DECLARE specification these are written as follows (the 
italicized labels are used in proofs to refer to facts that are deducible from the 
well-formedness of an environment). 



Cdec(TE,C) = SDME(CLASS (Csup, Is .fields , methods) ) ^ 

VCsup’ . Csup’ = SDME(Csup) — ^ 

~(TE |- Csup’ subclass_of C) lno^circular_classesl & 

(Vm at rtl. MSigsC(TE, Csup’ ) (m,MT(at ,rtl) ) ^ 

(3rt2. MSigsC(TE,C) (m,MT(at,rt2)) & 

TE |- rt2 expty_widens_to rtl) \_classjretum_typesjwider\) 

Csup = NONE ^ 

C = "Object" \_only_ObjectJiasjno_superclass\ & 
fields = { } \_ObjectJiasjnoJields\ & 

Is = { } \_ObjectJ,mplementsjioJ,nterfaces\ 

Vm mtl. (m,mtl) :: methods — > 

Vmt2. (m,mt2) :: methods — > 

Args(mtl) = Args(mt2) —> mtl = mt2 Lclass.argtypesjunique] 

Several constraints mentioned in Chapter 3 are guaranteed by the types of the 
constructs we have used to represent environments. A similar set of constraints 
must hold for each interface declaration. 

We can now define the static checks performed on Javas programs, and can 
assume we are operating with a well-formed type environment. Our rules differ 
from Chapter 3 only in that we translate to a simpler, annotated version of Javag 
called Java A rather than the runtime language Java/j. We later prove that the 
compilation process preserves types. 

We do not give the full details of the typing rules here, since they fol- 
low the rules given in Chapter 3 very closely. The rules give rise to a se- 
ries of relations for Javas (var_hastype, exp_hastype, stmt_hastype through 
to prog_hastype) and similar relations for Java^i (avar_hastype through to 
aprog_hastype). The annotation process is also described by a relation {'^ann 
= prog_annotates_to). As an example, the typing rule for references to local 
(stack) variables in both the unannotated and annotated languages is: 

PLDDKUP(VE) (x) = SOME(vt) 



(TEjVE) |- Id(x) var_hastype vt 



The typing rule for method calls in the unannotated language is: 
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LEN(args) = n & 

LEN(argtys) = n & 

(Vj . j < n — > (TE,VE) |- EL(j)(args) exp_hastype S0ME(EL( j ) (argtys) ) ) & 
(TE,VE) |- e exp_hastype SOME(vt) & 

MostSpecC(TE, vt,m, argtys) (mt) & 

(Vy. MostSpecC(TE,vt ,m, argtys) (y) mt = y) 



(TE,VE) |- Call(e,m,args) exp_hastype Res(mt) 

Note that iterated constructs are replaced by (bounded) universal quantification: 
the first three lines of the rule correspond to a side condition like: 

each argi has some type tyi (1 < z < n) 

Note the index change to take advantage of the inbuilt theory of natural numbers 
and zero-indexed lists and the use of the inbuilt list operators EL and LEN. The 
definition of MostSpec can be found in Chapter 3. The typing rule for the same 
construct in the annotated language is: 



LEN(args) = nargs & 

LEN (AT) = nargs & 

(TE,VE) |- e aexp_hastype SOME(vt) & 
MSigs(TE,vt) (m,MT(AT,rt)) 



(TE,VE) |- CallA(e,AT,m,args) aexp_hastype rt 

This completes our presentation of the static checks performed for the Javag 
language. We now move onto the runtime model of execution. 

2.3 The Runtime Semantics: Configurations, Runtime Terms, and 
State 

Chapter 3 models execution by a transition semantics, i.e. a ‘small step’ rewrite 
system [Plo91]. A configuration (s, t) of the runtime system has a state s and 
a runtime term (rterm) t. The rterm represents expressions yet to be evaluated 
and the partial results of terms evaluated so far. The configuration is progres- 
sively modified by making reductions. The rewrite system specifies an abstract 
machine, which is an inefficient but simple interpreter for Javas. 

A small step system is chosen over a ‘big step’ (evaluation semantics) since 
we want to reason about non-terminating programs, and later will want to model 
non-determinism and concurrency. Using a small-step system imposes significant 
overheads in the type soundness proof (e.g. with a big-step rewrite system certain 
intermediary configurations need not be considered), but this seems unavoidable. 

Runtime terms (the language Java^) are Java , a programs with the addition 
of addresses, exception packets and the method bodies that have been called. 
There are three types of rterms: expressions, variables and statements, and thus 
there are really three different types of configurations. As an intuition for what 
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we mean by this, consider the “top level” configuration: it always contains an 
expression (and a state) since Java begins execution with the main static method 
from a given class, and this eventually evaluates to an integer. 

In our model, the program state consists of two components: a list of frames 
of local variables and a heap containing objects and arrays. Neither are garbage 
collected®. Heap objects are annotated with types for runtime typechecking (in 



state = { frames: list of val), 

heap: addr^fl^ heap-object^ 

heap-object = ^ fldi vah, . . . , fldn val„ {object) 

I [[nab, . . . (array) 



Fig. 5. State 



the case of arrays this is the type of values stored in the array). We use the 
symbol 0 to indicate adding a new frame at the next available frame index, 
s{id) and s{addr) for looking up local variables and objects, and s{id) ^ val and 
s{addr) ^ heap-ohj for assigning things into the respective components of the 
state. 

2.4 The Rewrite System 

The reduction of rterms is specified by three relations, one for each syntax cate- 

exp var , stmt , . i \ 

gory: ^(r,p )5 (r,p) (= exp_reduces_to, var_reduces_to etc. j. 

Global parameters are an environment F (containing the class and interface hi- 
erarchies, needed for runtime typechecking) and the program p being executed. 
p contains Java^ terms: each time a method is executed we create a Java/j term 
for the body of that method. 

A term is ground if it is in normal form, i.e. no further reduction can be 
made. Being ground is a syntactic test, and the test can depend on the syntax 
category “from which a term is viewed” . A local variable id is ground if id is a 
variable, but not ground if id is an expression (this is the standard distinction 
between lvalues and rvalues in C). Formally, ground is defined as follows: 

— A value is ground ijf it is a primitive value or an address. 

— An expression is ground iff it is a ground value. 

® In future versions of the semantics a garbage collection rule collecting inaccessi- 
ble items at any time may be added. Garbage collection is semantically visible in 
Java because of the presence of ‘finally’ methods that get called before an object is 
deallocated. 
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— A variable is ground iff all its component expressions are ground. 

— A statement is ground ijf it is an empty block of statements or a ground 
expression. 

There are 36 rules in our rewrite system. 15 of them are “redex” rules that specify 
the reduction of expressions in the cases where sub-expressions have reductions. 
A sample from the DECLARE specification is: 



(stmt,s) stmt_reduce(TE,p) (stmt’, s’) 

(RBlock(stmt#stmts) , s) stmt _r educe (TE,p) (RBlock(stmt ’#stmts) , s ’ ) ‘ 

1 1 of the rules specify the generation of exceptions: 5 for null pointer dereferences, 
4 for bad array index bounds, one for a bad size when creating a new array and 
one for runtime type checking when assigning to arrays. A simple example is: 



exp_ground(exp) = T & 
val_ground(val) = T‘ 



(RAssign(RAccess (RValue(RAddr (NONE) ) , exp) ,RValue (val) ) , s) 

stmt_reduce (TE 



(RExpr (RValue (RExnC'NullPointExc") ) ) , s) ‘ 

The array creation rule is: 



p) 



ndims = LEN(dims) & 

(Vj . j < ndims exp_ground(EL( j ) (dims) ) ) & 

LEN(dimns) = ndims & 

(Vj . j < ndims — > 

3i32. EL(j)(dims) = RValue (RPrim(Int (i32) ) ) & 
dest_int32 (i32) >>= 0 & 
dest_int32 (i32) = EL(j ) (dimns) ) & 
s = (frames, heap) & 

alloc (heap, St , dimns, ext) = (val, heap’) & 
(frames,heap’ ) = s’ 



(RNewArray(st , dims, ext) , s) exp_reduce(TE,p) (RValue (val) , s ’ ) 

Here alloc represents the recursive process of allocating fci x . . . fc„_i arrays that 
eventually point to initial values appropriate for the type type. This process is 
described in detail in [GJS96].® 

® This model of array creation would need to be modified if threads or constructors 
are considered. Array creation is not atomic with respect to thread execution. It 
may execute constructors (and thus may not even terminate), and may raise an 
out-of-memory exception. 
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2.5 Runtime Typechecking 

Java performs runtime typechecks at just two places: during array assignment, 
and when casting reference values. Runtime typechecking is needed for array 
assignment because of the well-known problem with a co-variant array typing 
rule. Casts are not covered in this chapter: they are a trivial extension once 
runtime checking for arrays is in place. 

Runtime typechecking is performed by simply checking that the real (i.e. 
runtime) type of any reference object, as stored in the state, is narrower than 
the real type of the array cell it is being assigned to. This means the runtime 
system must have access to the program class/interfaces hierarchies (as the JVM 
does). 

An aside: The notion of runtime type checking from Chapter 3 (weak con- 
formance) is a little too strong: it allows the runtime machine to check the 
conformance of primitive values to primitive types. No realistic implementa- 
tion of Java checks at runtime that a primitive type such as int fits in a 
given slot. The problem stems from the fact that conformance is used for two 
purposes: 

— to represent the procedure invoked at runtime to do runtime typechecking 

— as an abstract concept used to formulate type soundness. 

The function typecheck checks that a stored type is compatible with a given 
type. It succeeds for an address addr, a type ty in a heap h if: 

— h{addr) =<C ... and ty is wider than C 

— or h{addr) = [[...]]*^ and ty is wider than ty' [] 

In future versions of the semantics this will not perform compatibility checks for 
primitive or null values. 

2.6 The Model as a DECLARE Specification 

DECLARE specifications can be interpreted as axioms in an appropriate logic, 
or, if executable, as a specification of an interpreter. The documents we have 
seen fragments from are abstracts, i.e. summaries of theories that are checked to 
be consistent extensions of higher order logic. The declaration forms available 
are simple (non-recursive) definitions, recursive datatype definitions (mutually 
recursive and recursive through positive type functions like list), inductive rela- 
tions (again mutually recursive, with any monotonic operators), and recursive 
functions with a well-founded measure.^ 

The syntax classes and semantic objects {exp, type, state etc.) are easily 
defined in DECLARE using logical datatypes, partial functions and sets — we 
will not give an example here. As we have seen inductive relations [CM92,Pau94] 
are formulated by specifying a set of rules, and giving a name to each. When 

^ Not all the features listed here are fully implemented in the version of DECLARE 
used for this work, for example monotonicity conditions are not currently checked. 
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treated as a logical specification, DECLARE generates the appropriate axioms 
for the least fixed point of the set of rules. 

Formalizing the runtime rewrite system as an inductive relation is relatively 
straight forward given DECLARE’s collection of background theories. 

The machine-acceptable specification runs to around 2500 lines in total. The 
use of three similar versions of the language results in some unfortunate dupli- 
cation that seems hard to avoid: e.g. we have three sets of typing rules that are 
very similar. Perhaps most importantly, the specification was easily read and 
understood by the authors of Chapter 3 when shown to them. 

3 Formulating Key Properties 

In this section we formulate, in the terms of the model just developed, the 
type soundness property we will prove. The main substantive differences with 
Drossopoulou and Eisenbach’s original formulation are: 

— We distinguish between the safety property (type soundness) and a liveness 
property, that says that the runtime machine can always proceed if the result 
is not yet ground. This formulation is correct for non-deterministic language 
constructs. 

— We correct the rule for typing array assignments at runtime. 

Loosely speaking, type soundness says that as evaluation progresses the config- 
uration of our rewrite system always conforms to the types we expect, and that 
terms only ever narrow in type. 

A frame typing F is a list of tables of types for local variables. A frame typing 
indicates what types we expect local variables to conform to. We require other 
auxiliary concepts too: 

— typing for rterms (rexp_hastype, rvar_hastype etc.); 

— self-consistency of a heap {Oheap = wf _heap); 

— conformance between frame typings and the local variables in a state 
{^frames = f raines_conf orm_to); 

— conformance between two heaps {^heap = heap_conf orms_to); 

— widening between two frame typings {<ftyp = ftyenv_leq) 

and define these in the sections that follow. 

Theorem 1. Type Soundness For a well-formed type environment F, an an- 
notated program p that typechecks and a state sq that conforms to some frame 
typing Fq, if a well-typed term to rewrites to some ti and a new state si, then 
either ti represents a raised exception, or there exists a new, larger frame typing 
Fi such that t\ has some narrower type than to in the new state and environment, 
and Si conforms to F\. That is, if 

— h TO 

— F hpO 
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- r h ho o 

- r,ho^fo^Fo 

- r, ho, Fo\- to: tyo and 

- to, So '^(r.p) ti,si with so = (/o, ho) and si = (/i, ft-i) 

then ti represents an exception or there exists F\ and tyi such that 

- Fh hiO 

- F, hi, Fi \- ti : tyi and 
F \~ tyi Fiuxin tyo- 

Note we assume a reduction is made, rather than proving that one exists. This 
is distinguishes the safety property from the liveness property. In the presence 
of non-determinism it is not sufficient to prove that a safe transition exists: we 
want to show that all possible transitions are safe. 

Type soundness is in fact three properties, one for each syntax category 
within rterms. For variables we write the property in DECLARE as follows (a 
similar property is used for expressions and statements): 



TE wf_tyenv A 

TE I - p aprog_hastype A 

TE |- heapo wf_heap A 

(TE,heapo) |- frEmieso f rames_conf orm_to Fo A 
(TE,Fo, heapo) 1“ varo rvar_hastype tyo A 

50 = (frameso, heapo) A 

51 = (framesi,heapi) A 

(varo, So) var_reduce(TE,p) (vari,si) 

exceptional_var (vari) 

V 3Fi tyi. 

TE |- heapi wf_heap A 

(TE,heapi) |- framesi f rames_conf orm_to Fi A 
(TE,Fi, heapi) 1“ vari rvar_hastype tyi A 
TE I - tyi widens_to tyo 



The proof of type soundness is by induction on the derivation of the typing 
judgment for to- The outline originally sketched by Drossopoulou and Eisenbach 
is a good guide, but is certainly ‘rough around the edges.’ The latter two syntax 
categories do not have types, so the statements are simpler. In the proof we 
strengthen the induction invariant in the following ways: 

F \~ ho '! heap hi, 

^ —ftyp Fo, 

— If to is a field variable, then ti is also and tyo = tyi. This is needed because 
field types on the left of assignments cannot narrow, otherwise runtime type- 
checking would be needed. 

— If to is an array variable, then ti is also, and similarly for stack variables. 
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3.1 Typing for Rterms and Conformance 

As in Chapter 3, the typing rules at runtime are those for annotated expres- 
sions, with the addition of rules for addresses. These make the typing relation 
dependent on the current heap. 

Note that we no longer demand unique typing: null values can be considered 
to have any reference type. The rule for assignments must also be different: the 
new rule drops the requirement that the source type be narrower than the target 
type in the case of array assignments, since this will be checked by runtime type 
checking. We will return to this issue in Section 6. The new rule for arrays is: 

r,h\- e:ty^ 
r,h\- arr [.idx] : ty^ 
r,h\- arr\,idx'\ :=e:void 

The definitions of conformance we use are similar to those in Chapter 3: A 
value V weakly conforms to a type ty with a heap h and type environment F if 

— ty is & primitive type and v is an element of that primitive type; or 

— ty is & reference type and f is a null pointer; or 

— w is an address whose value upon dereferencing h{v) is an instance of a class 
type C and F \- C <wdn ty; or 

~ u is an address whose value upon dereferencing h{v) is an array with elements 
of type tif [] ” and F \~ tif U <wdn ty. 

Value conformance states that the components of an object or array weakly 
conform. A value v conforms to a type ty with heap h and type environment F 
if V weakly conforms to ty and 

— if u is an address then h{v) =<C fldvals and for each {field, idx, tij) e 
FDecs(C') fidvals{field} is defined and weakly conforms to ti/; and 

— if u is an address then h{v) = [ [uec] ] and each val G vec weakly conforms 
to tj/. 

A heap h is self-consistent in F, written F \- hO if these hold: 

— if addr is an address and h{addr) =<C fldvals then addr conforms to C. 

— if addr is an address and h{addr) = [ [uec] ] then addr conforms to tif [] . 

A set of frames / conforms to a frame typing F (with a heap h and in T), written 
F,hG f^F if 

~ every local variable in every frame of / conforms to the corresponding type 
given in F; 

We expect each new heap to maintain value conformance in the following way: 
for in environment F a heap hi conforms with a heap ho at a set of addresses 
A, written F, A\- hi ^heap ho if 
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— for every addr in both A and ho, if addr conformed to some type ty in the 
context of ho, then addr also conforms to ty in the context of hi. 

We restrict the definition to a set of addresses A to allow for the possibility of 
garbage collection: we would then demand continued conformance only at a set 
of ‘active addresses’. Our current working definition makes A universal. 



3.2 Key Lemmas 

The following is a selective list of the lemmas that form the basis for the type 
soundness proof. These have been proved using DECLARE. 

Object is the least class 

If h T C’tyenv and F h Object Qciass C then C = Object. 

Widening is transitive and reflexive 

These hold for the ^=int, ^ref, Ficomp, Farr and Fwdn relations. The 

transitivity results only hold for well- formed environments. 

Compatible flelds and methods exist at subtypes 

Methods and fields visible at one type must still be visible at narrower types, 
though with possibly narrower return types. Put formally, if 
b F ‘O' tyenv and 
F b Cl C^;^gg Co and 
{{Cf,fld),tyfid) G FDecs(T, Co) 
then {{Cf,fld),tyfid) G FDecs(C, Ci). 

Similarly if T b tyi <ref tyo and 

{m,tyargs tj/rei) G MSigs(T, fj/o) 
then there exists some ty'^^t ^ith 
F b ty^^^ Fwdn tyret and 
{m,tyargs ty'^et) b MSigs(T, %) . 

Method fetching behaves correctly 

Assuming b F O tyenv and F b prog O , then if 
{m,tyargs tyret) G MSigs(T, %) and 
MetUBody (m,tyargs,ty,p) = method 

then F b method', tyret- That is, fetching the annotated body of a method 
from the annotated program results in a method of the type we expect. 

Compilation behaves correctly 

If b C O tyenv and 

F b mbody : tyret and 
F b mbody comp rmbody 

then F b rmbody : tyret- Note compilation is an almost trivial process in the 
current system, so this lemma is not difficult. 
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Relations are preserved under ¥^heap and <ftyp 

This holds for the typing, value conformance and frame conformance rela- 
tions. 

Atomic state/frame manipulations preserve ^heapi ^frames and Oheap 

We prove this for all primitive state manipulations, including object and 
array allocation, field, array and local variable assignment, and method call. 
The case for array allocation involves a double induction because of the 
nested loop used to allocate multi-dimensional arrays. 



3.3 Annotation and Liveness 

To complement the type soundness proof, we prove that the process of annotation 
preserves types: 



TE wf_tyenv A 

TE I - p prog_hastype A 

TE I - p prog_annotates_to p’ 

TE |- p’ aprog_hastype 



This property is proved by demonstrating that a similar property holds for all 
syntax classes from expressions through to class bodies. We also prove liveness, 
which is again three properties, the one for variables being: 



TE wf_tyenv A 

TE I - p aprog_hastype A 

TE |- heap wf_heap A 

(TE,heap) |- frames frames_conform_to F A 
(TE,F,heap) |- varo rvar_hastype ty A 
~var_ground varo A 
So = (frames, heap) 

^ Bvari Si . (varo, so) var_reduce(TE,p) (vari,si) 



4 Validating the Model 

We claim the specification we have developed so far is a correct formulation of 
the semantics of the Java subset we have in mind. But how do we know this, 
indeed how do we know our definitions are even logically consistent? 

Because of the style of definition we have used (least fixed points and simple 
recursive definitions), consistency is essentially trivial. Validity is harder: we 
have to measure this against the Java language standard [GJS96] and our own 
understanding of the meaning of constructs in the subset. 

We use two techniques to validate the specification: 

1 . Type checking of higher order logic; 

2. Compiling to ML and running test cases. 
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Here we concentrate on the second. Essentially we compile ‘manifestly exe- 
cutable’ specifications to Objective CaML code, thus generating an interpreter 
for the language based directly on our definitions. The interpreter is able to type- 
check and execute concrete Javag programs if given a concrete environment. The 
interpreter is not efficient, but is sufficient to test small programs. 

An example is required. The relation shown in Section 4 compiles to 

a ML function that is semantically equivalent to the following (we use CaML 
syntax): 



let rec subclass_of CE C = 

(fun O -> if wf_class(C) then seq_cons (C, seq_nil) else failO) seq_then 

(fun 0 -> match (PLOOKUP CE C) with 
NONE -> failO 

I SOME (C’ -> subclass_of CE C’); 

where seq_nil, seq_cons and the infix operator seq_then are the obvious op- 
erations on lazy lists, used to implement backtracking. Thus subclass_of will 
return a lazy list of identifiers and acts as a non-standard model of the relation 
defined by the inductive rules. Likewise we translate recursive functions to ML 
code, though no backtracking is needed here. 

Of course, not all inductive relations or higher order logic terms are exe- 
cutable under this scheme. The executable subset is large and fairly straight- 
forward to define, however only inductive relations that satisfy strict mode con- 
straints are admitted at present. That is, arguments must be divisible into inputs 
and outputs, and inputs must always be defined by previous inputs or generated 
outputs. This concept is familiar from Prolog: the mode constraints for the Qciass 
relation are (+, + ,-). We do not to translate directly to Prolog rules: this is 
clearly possible but unification is almost never required when expressing ‘man- 
ifestly executable’ rules, and indeed the elimination of all implicit unification 
steps is a typical way of proving the existence of an algorithm for the relations 
defined. 

Thus, DECLARE produces a CaML module for each specification we have 
written. The modules are compiled together and linked against a module which 
implements core functionality. Test cases are expressed as higher order logic 
expressions, though better would be the ability to parse, compile and run Java 
programs directly from Java source code. 

Approximately 40 errors were discovered by using these techniques. The 
breakdown of these was as follows: 

— Around 30 typing mistakes which led to mode violations. 

— Around 5 logical mistakes in the typing rules. 

— Around 5 logical mistakes in the runtime rules. 

From our experience with this technique, we would recommend that every sys- 
tem used for reasoning about programming language semantics include similar 
functionality. 
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5 The DECLARE Proof 

We now outline the DECLARE proof of type soundness. The reader should keep 
in mind that when this proof was begun, the only guide available was the rough 
proof outline in [DE97], and this was based on a formulation of the problem 
that was subsequently found to contain errors. Thus the process is one of proof 
discovery rather than proof transcription! 

The proof of type soundness proceeds by induction on the derivation of the 
typing judgment for the term to. We have one case for each rule in the mutually 
recursive inductive relations that define rterm expression, statement and variable 
typing judgments. The three mutually recursive goals are specified in DECLARE 
as follows (var_types, exp_types and stmt_types are macros for the induction 
invariants): 



if "TE wf_tyenv" 

"TE |- p aprog_hastype" 

"TE |- heapO wf_heap" 

"(TE,heapO) |- framesO frames_conf orm_to FTO" 
"sO = (framesO ,heapO) " 

"si = (f ramesl ,heapl) " 



<auto> 

<p_types> 
<heapO_conf orms> 
<framesO_conf orm> 
<auto> 

<auto> 



then 
if " 

II 

then 
or 
and 
if " 

II 

then 
or 
and 
if " 

II 

then 



(varO,sO) var_reduce(TE,p) (varl,sl)" <varO_reduces> 

(TE,FTO,heapO) |- varO rvar_hastype varO_ty" <varO_welltyped> 
"var_types (TE,FTO,heapO) varO varO_ty (varl.sl)" 
"exceptional_var (varl) " <varl_exceptional> 



(expO,sO) exp_reduce(TE,p) (expl,sl)" <expO_reduces> 

(TE,FTO,heapO) |- expO rexp_hastype expO_ty" <expO_welltyped> 
"exp_types (TE,FTO,heapO) expO_ty (expl.sl)" 

"exceptional_exp(expD " <expl_exceptional> 



(stmtO,sO) stmt_reduce(TE,p) (stmtl,sl)" 
(TE,FTO,heapO) |- stmtO rstmt_hastype" 
"stmt_types (TE,FTO,heapO) (stmtl,sl)" 
exceptional_stmt (stmtl) " 



<stmtO_reduces> 

<stmtO_welltyped> 

<stmtl_exceptional> 



Note how we name facts, and can have multiple (disjunctive) goals. The name 
<auto> indicates a fact should be implicitly included in all future justifications 
for this branch of the proof. The induction step of the proof is specified by: 



proceed by rule induction on 

<varO_welltyped> , <expO_welltyped> , <stmtO_welltyped> 
with TE,heapO , framesO , FTO, sOjheapl ,framesl , si ,p constant; 

This step hides a great deal of complexity: DECLARE determines the correct 
induction theorem to use based on the given judgments, and computes the in- 
duction predicate based on the problem statement. The with . . . constant 
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construct tells DECLARE that the given local constants do not ‘vary during the 
induction’, i.e. the induction hypotheses do not need to be general over these. 
DECLARE will warn the user if a case of the induction is skipped, and can 
present the cases remaining if asked to do so. The user is free to write the cases 
in any order. 

The first case we consider is very easy: it is when varg is a local variable. 
Local variables are ground, so there are no reductions possible, and we get an 
immediate contradiction. The proof is: 



case StackVar 

"varO = RStackVar (frame , id) " <auto>; 

contradiction by rule cases on <varO_reduces> ; 

That is, by considering the different possibilities of how the fact varO_reduces is 
derived, we get an immediate contradiction. We have given a claim (in this case 
the simple claim that we can derive a contradiction, though in general we assert 
a complex set of facts, possibly introducing new variables) and a justification 
(by . . .). The construct rule cases is a simple function that is an example of 
the way we specify hints that help the proof checker (other examples are simply 
quoting facts directly, or giving a fact with some explicit instantiations). DE- 
CLARE combines ‘forward’ and ‘backward’ proof: each step specifies something 
we have to prove, given our current context (in this case a contradiction). The 
automation uses a combination of proof techniques to prove the result, and the 
justification hints we give can involve little ‘forward’ proofs in their own right. 

The next case we will consider is where stmto assigns to an array. DECLARE 
informs us of the available inductive hypotheses, though we choose our own 
names for the new variables and facts: 



case AssignToArray 

"IvalO = RAccess (arrO, idxO) " 

"stmto = RAssign(RAccess Ival0,rexp0) " 

"Vexpl . (rexpO,sO) exp_reduce(TE,p) (expl,sl) 

— > exp_types (TE,FTO,heapO) rexpO_ty (expl,sl) 

I exceptional_exp expl" 

" (TE,FTO,heapO) |- rexpO rexp_hastype rexpO_ty" 

"Vvarl . (RAccess Ival0,s0) var_reduce(TE,p) (varl.sl) 

^ var_types (TE,FTO,heapO) (RAccess IvalO) lvalO_ty (varl.sl) 

I exceptional_var varl" <ihyp_f or_lvalO> 

" (TE,FTO,heapO) |- RAccess (arrO, idxO) rvar_hastype lvalO_ty" 

<lvalO_types_in_sO> ; 



<auto> 

<auto> 



<ihyp_f or_rexpO> 
<rexpO_types_in_sO> 



This case can be decomposed into three sub-cases as follows, because there are 
only three interesting reductions that can occur when our top term is an assign- 
ment: 
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cases by rule cases on <stmtO_reduces> , 
not <stmtl_exceptional> , 

<exceptional> ; 

// the lvalue reduces 

case "(Ival0,s0) var_reduce(TE,p) (lvall,sl)" <lvalO_reduces> 

"stmtl = RAssignClvall ,rexpO) " <auto>; 

// the rvalue reduces 

case "(rexpO,sO) exp_reduce(TE,p) (rexpl,sl)" <rexpO_reduces> 

"stmtl = RAssignClvalO ,rexpl) " <auto>; 

/ / both are ground, so the assignment happens 

case "arrO = RValue (RAddr (SOME(taddr) ) ) " <auto> 

"idxO = RValue (RPrimClnt (k32) )) " <auto> 

"rexpO = RValue (val)" <auto> 

"heapO (taddr) = cell" <cell> 

"cell = SOME (ARRAY (arrty ,vec) ) " <lookup> 

"idx = dest_int32(k32) " <auto> 

"idx >>= 0" <auto> 

"idx < LEN(vec)" <auto> 

"typecheckC (TE,heapO) , val, arrty) " <val_f its> 

"heapl = heapO <++ (taddr , ARRAY (arrty ,REPL idx vec val))" <auto> 
"stmtl = RExpr (RVoid) " <auto> 

"framesl = framesO" <auto>; 

Here we see DECLARE’s third (and final!) proof language construct: decompo- 
sition into cases, perhaps introducing new constants in each case. Those used 
to tactic based theorem provers may find it difficult to believe that these three 
constructs are sufficient to express any proof, and even harder to believe that 
proofs end up simpler: this is discussed further in Appendix A. 

The cases may look daunting, but consider how far we have come with 
this case-split: the justification for the split is based on the possibilities for 
how the stmto could have reduced (rule cases on <stmtO_reduces>), on the 
fact that in the cases where an exception is produced the proof is trivial (not 
<stmtl_exceptional>), and on the definition of what it means for a value to 
be exceptional (<exceptional>). Thus we have eliminated all the cases in array 
assignment where exceptions arise (there are three), as well as the 30 cases where 
the reduction rules do not apply to array assignments (each of these require some 
proof). 

Of the remaining cases, the first two correspond to redex rules for arrays, and 
their proofs use the induction hypotheses. The final case is the most interesting 
one: it is where both the left and right of the assignment are ground. The rest 
of the proof for this case is shown in Appendix D. 

The above proof text was arrived at by repeatedly refining an approximate 
proof script, and also cut-and-pasting some reasoning steps from previously 
proved cases. This process was repeated for all 36 major cases of the type sound- 
ness proof. Typically, a first pass at formally checking a proof will result in 
roughly: 
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— 50% of the steps (i.e. logical leaps) in the proof being accepted immediately; 

— 25% of the steps requiring the addition of one or two supporting facts, and 
perhaps some explicit instantiations; 

— 25% of the cases requiring more thought than anticipated. 

The success rate increases for cases that are very similar to previous ones. Ma- 
chine checking the proofs up to the lemmas that were outlined in Section 3.2 
took around 30 minutes to one hour per case: more for complex cases such as 
procedure call. The lemmas were initially assumed, and proofs given to them at 
a later stage. This sometimes involved refining the original proof script further, 
or adapting the model where it was found to be inadequate. 

The reader should note that although a very powerful automated routine 
may be able to find the entire proof for one of these cases after the fact, the very 
process of writing the proof corrected significant errors in the rough draft that 
would have confounded even the best prover. Increased automation gives us a 
diminishing return as we venture into areas where the correct formulation takes 
care to find. 

6 Errors Discovered 

In this section we describe an error in the Java language specification that we 
independently rediscovered during the course of this work. We also describe one 
major error and a noteworthy omission in Drossopoulou and Eisenbach’s original 
presentation of the type soundness proof. 

6.1 An Error in the Java Language Specification 

In the process of finishing the proofs of the lemmas described in Section 3.2 we 
independently rediscovered a significant flaw in the Java language specification 
that had recently been found by developers of a Java implementation [PB97]. In 
theory the flaw does not break type soundness, but the authors of the language 
specification have confirmed that the specification needs alteration. 

The problem is this: in Java, all interfaces and arrays are considered sub- 
types of the type Object, in the sense that a cast from an interface or array type 
to Object is permitted. The type Object supports several “primitive” meth- 
ods, such as <object>.hashValue() and <object>.getClass() (there are 11 
in total). The question is whether expressions whose static type is an interface 
support these methods. 

By rights, interfaces should indeed support the Object methods - any class 
that actually implements the interface will support these methods by virtue of 
being a subclass of Object, or an array. Indeed, the Sun JDK toolkit allows 
calling these methods from static interface types, as indicated by the successful 
compilation (but not execution) of the code: 
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public interface I { } 
public class Itest { 

public static void main(String args [] ) { 

I a[] = { null, null }; 
a [0] . hashCode ( ) ; 
a[0] .getClassO; 
a [0] . equals (a [1] ) ; 

} 

} 

However, the existing language specification states explicitly that interfaces only 
support those methods listed in the interface or its superinterfaces, and that 
there is no ‘implicit’ superinterface (i.e. there is no corollary to the ‘mother-of- 
all-classes’ Object for interfaces. To quote: 

The members of an interface are all of the following: 

— Members declared from any direct superinterfaces 
— Members declared in the body of the interface. 

There is no analogue of the class Object for interfaces; that is, while 
every class is an extension of class Object, there is no single interface of 
which all interfaces are extensions. 



[GJS96], pages 87 and 185 

The error was detected when trying to prove the existence of compatible methods 
and fields as we move from a type to a subtype, in particular from the type 
Object to an interface type. 



6.2 Runtime Typechecking, Array Assignments, and Exceptions 

In Drossopoulou and Eisenbach’s original formulation the type soundness prop- 
erty was stated along the following lines (emphasis added) : 

Theorem 2. If a well-typed term t is not ground, then it rewrites to some t' 
(and a new state s and environment F). Furthermore, either t' eventually 
rewrites to an exception, or t' has some narrower type than t, in the new state 
and environment. 

The iterated rewriting was an attempted fix for a problem demonstrated by the 
following program: 



void sillyCC arr [] , C s) { 
arr[l] = s; 

} 



At runtime, arr may actually be an array of some narrower type, say C’ where 
C’ is a subclass of C. Then the array assignment appears to become badly typed 




Proving Java Type Soundness 109 



before the exception is detected, because during the rewriting the left side be- 
comes a narrower type than the right. Thus they allow the exception to appear 
after a number of additional steps. 

However, arr can become narrower, and then subsequently fail to terminate! 
Then an exception is never raised, e.g. 



arr [loop ()] = s; 

The problem occurs in even simpler cases, e.g. when both arr and s have some 
narrower types C’ [] and C’. Then, after the left side is evaluated, the array 
assignment appears badly typed, but will again be well typed after the right 
side is evaluated. 

Fixing this problem requires a different understanding of the role of the types 
we assign to rterms. Types for intermediary rterms only exist to help express 
the type soundness invariant of the abstract machine, i.e. to define the allowable 
states that a well-typed execution can reach. In particular, the array assignment 
rule must be relaxed to allow what appear to be badly typed assignments, but 
which later get caught by the runtime typechecking mechanism. 

This problem is an interesting case where the attempted re-use of typing 
rules in a different setting (i.e. the runtime setting rather than the typechecking 
setting) led to a subtle error, and one which we believe would only have been 
detected with the kind of detailed analysis that machine formalization demands. 
The mistake could not be missed in that setting! 

6.3 Side Effects on Types 

A significant omission in Drossopoulou and Eisenbach’s original proof was as 
follows: when a term has two or more subterms, e.g. arr[idx] := e, and arr 
makes a reduction to arr’, then the types of idx and e may change (become 
narrower) due to side effects on the state. This possibility had not originally 
been considered by Drossopoulou and Eisenbach, and requires a proof that heap 
locations do not change type (our notion of heap conformity suffices). We also 
need lemmas stating that typing is monotonic with respect to the < frame and 
^heap relationship, up to the <wdn relationship. The foremost of these lemmas 
has been mentioned in Section 3.2. This problem was only discovered while doing 
detailed machine checking of the rough proof outline. 

7 Summary 

This chapter has presented corrections to the semantics of Javas, a machine 
formalization of this semantics, a technique to partially validate the semantics, 
and an example of the use of new mechanized proof techniques to prove the type 
soundness property for that language. 

The work demonstrates how formal techniques can be used to help specify 
a major language. Java itself is far more complicated than Javag, but we have 
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still covered a non trivial subset. The formalization in Chapter 3 was the orig- 
inal inspiration for this work. We suggest that in the long run theorem prover 
specifications may provide a better format for the formalization, especially when 
flexible tools are provided to read, execute and reason about it. 

The disciplined approach enforced when writing a proof to be accepted by 
a mechanized tool ensures errors like those described in Section 6 are detected. 
The declarative proof language played a very useful role: it allowed the author 
to think clearly about the language while preparing the proof outlines for the 
computer. The first error was found when simply preparing the proof outline, 
rather than when checking it in detail. During this process of preparation the 
question ‘will a machine accept this proof?’ was always in the back of the author’s 
mind, and this ensured that unwarranted logical leaps were not made. 

The independent rediscovery of the mistake in the Java language specification 
described in Section 6.1 indicates that such errors can indeed be discovered by 
the process of formal proof. However the mistake had already been discovered 
by implementors attempting to follow the language specification precisely. 

It is commonly accepted that formal specification in a logic is of value when 
debugging specifications. This work has demonstrated that proof sketching and 
proof checking can also be of value, even while the theoretical framework for the 
language is still under development. It is interesting to note that of the three 
major errors, two were discovered at a late stage in the work. 



7.1 Related Work 

In the following chapter, Tobias Nipkow and David von Oheimb present their 
work on developing a proof of the type soundness property for a similar subset 
of Java in the Isabelle theorem prover. I am extremely grateful for the chance to 
meet with them and have adopted suggestions they have made. These two works 
are valuable ‘modern’ case studies of theorem proving methods applied this kind 
of problem. Isabelle is a mature system and has complementary strengths to 
DECLARE, notably strong generic automation and manifest soundness. A tool 
which unites these strengths with DECLARE’s is an exciting prospect. 



7.2 Future Work 

The model presented in this article has scope to be extended in many directions. 
The treatment could be expanded to encompass features such as exceptions, 
constructors, access modifiers, static fields and static methods without major 
problems, although this would involve a significant expansion in the size of the 
proofs. Features such as threads and Java’s semantically visible garbage collec- 
tion pose greater problems, but should also be possible. 

The work began as a case study for the application of a declarative proof 
language to operational reasoning, and there are ways in which DECLARE (or 
similar systems) could be improved based on this experience. The most necessary 
features are the implementation of decision procedures for ground equational 
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reasoning (as in PVS [ORR+96]) and a small amount of ‘Computer Aided Proof 
Writing’, as described briefly in [Sym97]. 
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A A Brief Introduction to declare 

DECLARE is a proof checker for simple, polymorphic, higher order logic. It is 
designed to aid in the production of clear, readable, maintainable specification 
and proof documents. DECLARE is not a complete or polished system, and 
has been developed with the aim of testing various features that could be in- 
corporated in existing, supported theorem provers such as HOL, Isabelle and 
PVS. It has been influenced heavily by Mizar, HOL, HOL-lite, Isabelle and PVS 
[Rud92,ORR+96,GM93,Har96,Pau90]. It is not an LCF-style system: deductions 
are not reduced to a primitive logical framework, though in principle we are 
confident this is possible. The features of interest here are: 

— The declarative language used to express proof outlines. 

— The support for modularization, separate processing and top-down formal- 
ization, which leads to a well-structured, efficient working environment. 

— The automated proof support. 



A.l The Proof Language 

We try to achieve, by the simplest means possible, results that are both machine 
checkable and human readable. ® 

DECLARE’s proof language was originally inspired by Mizar and work by 
Harrison [Har97], but has been considerably streamlined. The language was 
demonstrated by example in Section 5. and uses just three main constructs: 

— Induction; 

— Case-decomposition; and 

— Justifications. 

® Some researchers take the view that human readable proofs should be generated as 
output from mechanized proofs: this may be possible, but it is a highly complex 
process and the results are not yet convincing. Our approach is to make the input 
readable in the first place. 
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Several other constructs are degenerate forms of these constructs: e.g. assert- 
ing new facts, perhaps involving new local constants, is a degenerate form of 
case-decomposition where there is only one case. Similarly introducing an ab- 
breviation is a degenerate form of asserting new facts, where there is one new 
fact and one new local constant. 

Those used to tactic based theorem provers may find it difficult to believe 
that these three constructs are sufficient to express any proof, and even harder 
to believe that proofs end up simpler. It is clear that any higher order logic proof 
can indeed be expressed: we simply have to implement the basic proof rules of 
the logic within the default proof obligation checker. The key reasons why proofs 
end up simpler with DECLARE are: 

— (a) It provides excellent support for specifying complex reasoning deep within 
a logical context; 

— (b) Case splits may be based on a complex argument, rather than some 
simple syntactic criteria (as is usually the case in a tactic based theorem 
prover). Many trivial cases disappear without thought. 

— (c) The proof style encourages extensive use of abbreviations, as in written 
proofs, and gives easy control over variable and fact naming. A common 
accusation levied at declarative proof languages is that in large verifications 
terms get too large to be written out by hand. However, we would claim 
the exact opposite: in large interactive proofs, terms get so complex it is 
essential that a human he in charge of keeping the complexity under control. 
This can be done through definitions, abbreviations and other conveniences 
both logical and notational. These are essential to the production of an 
elegant, clean and maintainable proof. 

Of course, none of DECLARE’s constructs are incompatible with tactics, but our 
experience indicates that adding more traditional tactic constructs into the proof 
language does not gain much, and has the potential to destroy many of the useful 
properties the language enjoys. 

Such proof languages are called declarative, to place them in contrast to 
‘procedural’ (often tactic based) mechanisms for specifying proofs. The main 
feature of a declarative language is that the machine works out the vast ma- 
jority of the syntactic manipulations necessary to achieve a proof (especially 
those associated with propositional connectives, first-order quantifiers and asso- 
ciative/commutative operators), leaving the user free to simply declare a seman- 
tic intent. The use of a declarative proof language has clear advantages: 

— Declarative proofs are more readable than tactic proofs. 

— Proof interpretation always terminates, unlike tactic proofs which are ex- 
pressed in a Turing-complete language. In particular guaranteed termination 
makes error recovery in proof checking more tractable. 

— Declarative proofs are potentially more maintainable under changes to the 
specification and the prover. 

— Declarative proofs are potentially more portable. Specification and proof 
documents developed with DECLARE are, in principle, portable to other 
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proof systems and may even be interpreted in other sufficiently powerful 
logics. 

— A declarative style appeals to a wider class of users, helping to deliver auto- 
mated reasoning and formal methods to mathematicians and others. 



A. 2 The Working Environment 

When using DECLARE, large bodies of work are broken into a series of articles, 
each of which may have an interface called an abstract. Articles are checked 
relative to the abstracts they import, and must ‘implement’ the abstract they 
export. Abstracts may be pre-compiled, which, in combination with the make 
system, gives us a simple, yet light-weight and effective means for maintaining 
the coherence of large collections of specifications and proofs. This approach also 
means DECLARE typically uses only 5-6 MB of memory when executing. 



A. 3 Automation 

DECLARE proofs are only proof outlines, and require automation to fill in the 
gaps in the argument. In this way the proof language acts as a bridge between 
the human and the automated capabilities of the proof checker. 

The automation we use in this chapter is fairly straightforward: 

— We use Boyer-Moore/Isabelle style simplification with conditional, higher- 
order rewriting to normalize expressions. Simplification is performed under 
the auspices of a two-sided sequent calculus like that used by PVS. During 
simplification we: 

• Apply safe introduction and elimination rules, e.g. choosing witnesses 
for existentials in assumptions; splitting disjuncts in goal formulae; and 
transferring negated formulae across the sequent. 

• Apply ‘unwinding’ rules to eliminate local constants from existential and 
universal formulae, including the sequent itself. 

• ‘Untuple’ all pair, tuple and record values. 

• Apply a large background set of (conditional) rewrites collected from 
imported abstracts; 

• Normalize arithmetic expressions; 

• Case-split on constructs such as conditionals and quantified structural 
variables (booleans, options etc.). 

• Use exploratory unwinding of some definitions, in the style of PVS. 

• Use controlled left-right simplification of certain ‘program-like’ 
constructs, which helps implement partial evaluation and avoids com- 
mon causes of non-terminating rewriting strategies. 

— After simplification we use a simple model elimination [Lov68] prover (with 
time, variable and depth limited iterative deepening) to search for values for 
unknowns. 




Proving Java Type Soundness 115 



This level of automation has been sufficient during exploratory proof develop- 
ment, since in this most important stage we are content with guiding the prover 
through the proof without expecting complex steps, such as inductions, to be 
automated. The only significant problems arise when we venture into problem 
spaces that requires significant equality and proof-search reasoning (this is still 
a major research area), or equality reasoning not amenable to rewriting (adding 
congruence closure will solve this). 

Automation in DECLARE is guided by ‘pragmas’: lemmas are given once- 
only ‘how to use me’ declarations, and no weightings or other obscure hints are 
specified when a lemma is used. This helps ensure that proof documents are not 
overly reliant on quirks of the underlying prover, and are robust as the prover 
itself changes. 

B The Full Widening Rules 

These rules determine the widening (subtype) relation. 



r^c\z,ussC rh I' igt 

r \- C <sre/ C' r \- I <sref I' F \~ I <sref Object 

ty, ty' € simple-ref-types 
ty e prim-types F \- ty < sref ty 
T b ty 'Ficomp ty F ty 'Ficomp ty^ 

ty e component-types n > 0 

n ^ 0 F ty ^comp ty 

r b ty □ " <arr Ob j ect T b ty [] " <arr ty' [] " 

ty, ty' € array-types ty, ty' £ simple-ref-types 

F b ty<arrty' F b ty<arefty' ty £ ref-types 

F \- ty <ref ty' F \- ty <ref ty' F b null! <ref ty 

ty, ty' £ ref-types 
ty £ prim-types F \- ty <ref ty 
T b ty 'Fiindn ty F b ty ^wdn ty' 



C The Full Traversal Rules 

These rules determine what methods and fields are visible from a given class. The 
relations evaluate graphs, which in well-formed environments determine partial 
functions. 

F{C).Ma{field) = ty 
Tb (C, ty)£FDec{C,field) 



TbCCciassC" 

r b c" -.i^p I 
Tb7Ci„t F 

FhC<sref F 
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r{C) = {Csup,flds,...) flds{field) = ± rh {C',ty)GFDec{Csup, field) 
r\- {C',ty)GFDec{C,field) 

r{C).Ms{field) = ty r{C) = {Csup, ■ ■ ■) rF {{C', field), ty)GFDecs{Csup) 
r h {{C, field), ty)GFDecs{C) F h {{C, field), ty) e FDecs(C) 

r{C).meihs{meth,at) = rt 
r h {{meth, at), rt) € MSigS(j(C) 

-T(C) = (Csup, meths) meths{meth, at) = _L F F {{meth, at), rt) £ MSigSj^ (Cs„p) 
F h {{meth, at), rt) £ MSigS(j(C) 

D The Proof for Ground Array Assignments 



case AssignToArray 

"IvalO = RAccess (arrO , idxO) " <auto> 

"stmtO = RAssign(RAccess IvalO ,rexpO) " <auto> 

"Vexpl . (rexpO,sO) exp_reduce(TE,p) (expl,sl) 

^ exp_types (TE,FTO,heapO) rexpO_ty (expl,sl) 

I exceptional_exp expl" <ihyp_f or_rexpO> 

" (TE,FTO,heapO) |- rexpO rexp_hastype rexpO_ty" 

<rexpO_types_in_sO> 

"Vvarl . (RAccess IvalO, sO) var_reduce(TE,p) (varl.sl) 

^ var_types (TE,FTO,heapO) (RAccess IvalO) lvalO_ty (varl.sl) 

I exceptional_var varl" <ihyp_f or_lvalO> 

" (TE,FTO,heapO) |- RAccess (arrO , idxO) rvar_hastype lvalO_ty" 

<lvalO_types_in_sO> ; 

cases by rule cases on <stmtO_reduces>, 
not <stmtl_exceptional>, 

<exceptional> ; 

// Case 1: the lvalue reduces 

case "(IvalO, sO) var_reduce(TE,p) (lvall,sl)" <lvalO_reduces> 

"stmtl = RAssigndvall ,rexpO) " <auto>; 



// Case 2: the rvalue reduces 

case "(rexpO,sO) exp_reduce(TE,p) (rexpl,sl)" <rexpO_reduces> 
"stmtl = RAssign(lvalO,rexpl) " <auto>; 



/ / Case 3: both are ground, so the assignment happens 
case 



arrO = RValue (RAddr (SOME(taddr) ) ) " 


<auto> 


idxO = RValue (RPrimdnt (k32) )) " 


<auto> 


rexpO = RValue (val)" 


<auto> 


heapO(taddr) = cell" 


<cell> 


cell = SQME(ARRAY(arrty,vec))" 


<lookup> 


idx = dest_int32 (k32) " 


<auto> 


idx >>= 0" 


<auto> 


idx < LEN(vec)" 


<auto> 


typecheck((TE,heapO) ,val,arrty) " 


<val_f its> 
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"heapl = heapO <++ (taddr ,ARRAY(arrty,REPL idx vec val))" <auto> 
"stmtl = RExpr (RVoid) " <auto> 

"framesl = framesO" <auto>; 



// Because the lvalue is well-typed, its constituents must 
// also be well-typed. We need these facts to derive interesting 
/ / things about the content of the array we are assigning into. 
consider simpty,dim,ndim st 
" (TE,FTO,heapO) |- arrO rexp_hastype 

SOME (VT(simpty, dim) ) " <arrO_types> 

"ON < dim" <auto> 



" (TE,FTO,heapO) |- idxO rexp_hastype 

SDME(VT(PrimT(intT) ,ON))" 

"ndim = dim- IN" 

"lvalO_ty = VT(simpty,ndim) " 

by rule cases on <lvalO_types_in_sO> ; 



<idxO_types> 

<auto> 

<auto> 



/ / The type of the target address 
/ / correlates with that of the vector. 

have "(TE,heapO) |- RAddr (SOME(taddr) ) rval_hastype 

VT(simpty,dim) " <taddr_types> 

by rule cases on <arrO_types> ; 



/ / The stuff stored at the target address is an array. . . 
consider dim’, vec’ st 

"dim = dim’+lN" <auto> 

"cell = SOME (ARRAY (VT(simpty, dim’ ), vec ’)) " <auto> 

by rule cases on <taddr_types>,<cell>; 



// And the array that’s stored there looks exactly like we expect... 
then have "arrty = VT(simpty , dim’ ) " <auto> 

"vec = vec’" <auto> by <lookup>; 

// Now the rhs: it’s ground so it’s really a value... 
consider rexpO_vty st 

"rexpO_ty = SDME(rexpO_vty) " <auto> 

"(TE,heapO) |- val rval_hastype rexpO_vty" <val_types> 
by rule cases on <rexpO_types_in_sO>; 



// Now we have everything we need to invoke our lemma that 
/ / assigning into an array maintains the necessary heap properties. 

have "TE |- heapl wf_heap" <heapl_wf> 

"TE |- heapO heap_conf orms_to heapl" <heapl_conforms> 
by <array-assign-conf orms-lemma> ["TE" , "heapl" , "heapO"] , 
<heapO_conf orms> , <val_f its> , <val_types> , <lookup> , <cell> ; 



// And similarly for the frames: this is easy because they 
/ / don ’t change, and we just invoke the property that 
// frames^conforms'to is monotonic under heap_conformsJ,o. 
have "(TE, heapl) |- framesO frames_conf orm_to 
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FTO" <f ramesO_conform_in_sl> 
by <frames_conform-mono-lenmia>, <framesO_conform>, 
<heapl_larger> , <heapO_wf > , <heapl_conf orms> ; 



// Finally we can derive the typing judgment for stmtl... 
have " (TE,FTO,heapl) |- stmtl rstmt_hastype" 

by <rstatics Expr> ["NONE"] , <rstatics Void>; 



// And we have everything we need to show the induction invariant 
// still holds. 

then qed by <heapl_wf>, <framesO_conf orm_in_sl> , 

<heapl_conf orms>, <stmt_types>; 



end 
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Abstract. In this article we present Bali, the formalization of a large 
(hitherto sequential) sublanguage of Java. We give its abstract syntax, 
type system, well-formedness conditions, and an operational evaluation 
semantics. Based on these definitions, we can express soundness of the 
type system, an important design goal claimed to be reached by the 
designers of Java, and prove that Bali is indeed type-safe. 

All definitions and proofs have been done formally in the theorem prover 
Isabelle/HOL. Thus this article demonstrates that machine-checking the 
design of non-trivial programming languages has become a reality. 



1 Introduction 



Bali is a large subset of Java This article presents its formalization 

and the proof of a key property, namely the soundness of its type system — 
specified and verified in the theorem prover Isabelle/HOL 

On the face of it, this article is mostly about Bali, its abstract syntax, type 
system, well-formedness conditions, and operational semantics, formalized as a 
hierarchy of Isabelle theories, and the structure of the machine-checked proof of 
type soundness and its implications. Although these technicalities do indeed take 
up much of the space, there is a meta-theme running through the article, which 
we consider even more important: the technology for producing machine-checked 
programming language designs has arrived. 

We emphasize that by ‘machine-checked’ we do not just mean that it has 
passed some type checker, but that some non-trivial properties of the language 
have been established with the help of a (semi-automatic) theorem prover. The 
latter process is still not a piece of cake, but it has become more than just 
feasible. Therefore any programming language intended for serious applications 
should strive for such a machine-checked design. The benefits are not just greater 
reliability, but also greater maintainability because the theorem prover keeps 
track of the impact that changes have on already established properties. 



This is a completely revised and extended version of 
Research supported by DFG SPP Deduktion. 



Jim Alves-Foss (Ed.): Formal Syntax and Semantics of Java, LNCS 1523, pp. 119-^^^ 1999. 
(c) Springer-Verlag Berlin Heidelberg 1999 
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Note that the type-safety of Java is not sufficient to guarantee secure execu- 
tion of bytecode programs on the Java Virtual Machine, because the bytecode 
might be tampered with, produced by a faulty compiler, or not be related to 
any Java source program at all. This was the main reason for introducing the 
Bytecode Verifier in the JVM, which checks the integrity, in particular type- 
correctness, of any bytecode before execution. Of course similar security prob- 
lems arise for any other high-level languages as well. Nevertheless, the investiga- 
tion of type-safety at source level is worthwhile: it checks whether the language 
design is sound, which means that at least all the necessary conditions express- 
ible at that level are fulfilled. In particular static typing loses much of its raison 
d’etre if the language is not type-safe. 



1.1 Related Work 



The history of type soundness proofs goes back to the subject reduction theo- 
rem for typed A-calculus but s tarts in earnest with Milner’s slogan ‘Well-typed 
expressions do not go wrong’ in the context of ML. Milner uses a de- 

notational semantics, in contrast to most of the later work, including ours. The 
question of type-safety came to prominence with the discovery of its failure in 
Eiffel Ever since, many designers of programming languages (especially 

00 ones) have been at pains to prove type- safety of their languages (see, for 
example, the series of papers by Bruce et al. . 

Directly related to our work is that by Drossopoulou and Eisenbach 
who prove (on paper) type-safety of a subset of Java very similar to Bali. 
Although we were familiar with an earlier version of their work and 

have certainly profited from it, our work is not a formalization of theirs in 
Isabelle/HOL but differs in many respects from it, for example in the repre- 
sentation of programs and the use of an evaluation (aka “big-step”) semantics 
instead of a transition (aka “small-step”) semantics. Simultaneously with our 
work, Syme formalized the paper as far as possible, uncovering 

two significant mistakes, both in connection with the use of transition semantics. 
Syme uses his own theorem prover DECLARE, also based on higher-order logic. 

There are two other efforts to formalize aspects of Java in a theorem prover. 
Dean studies the interaction of static typing with dynamic linking. His 

simple PVS specification addresses only the linking aspect and requires a formal- 
ization of Java (such as our work provides) to turn his lemmas about linking into 
theorems about the type-safety of dynamically linked programs. Cohen 
has formalized the semantics of large parts of the Java Virtual Machine, essen- 
tially by writing an interpreter in Common Lisp. He used ACL2, the latest ver- 
sion of the Boyer-Moore theorem prover No proofs have been reported 

yet. 
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2 Overview 

Bali includes the features of Java that we believe to be important for an inves- 
tigation of the semantics of a practical imperative object-oriented language: 

— class and interface declarations with instance fields and methods, 

— subinterface, subclass, and implementation relations 
with inheritance, overriding, and hiding, 

— method calls with static overloading and dynamic binding, 

— some primitive types, objects (including arrays), 

— exception throwing and handling. 

This portion of Java is very similar to that covered by and 

We do not consider Java packages and separate compilation. For the moment, 
we also leave out several features of Java like class variables and static methods, 
constructors and finalizers, the visibility of names, and concurrency, but we aim 
to include at least part of them in later stages of our project. Some constructs 
are simplified without limiting the expressiveness of the language (see 

In developing the formalization of Bali and investigating its properties, we 
aim to meet the following design goals: 

— faithfulness to the official language specification, 

— succinctness and simplicity, 

— maintainability and extendibility, 

— adequacy for the theorem prover. 

It might be interesting to keep these goals in mind while reading )Jon the 
formalization of Bali and lHon our proofs and check how far we have reached 
them. We comment on our experience in pursuing these goals in Q 

3 The Basics of Isabelle/HOL 

Before we present the formalization of Bali, we briefly introduce the underlying 
theorem proving system Isabelle/HOL. 

Isabelle/HOL is the instantiation of the generic interactive theorem prover 
Isabelle with Church’s version of Higher-Order Logic and is very close 

to Gordon’s HOL system In this article HOL is short for Isabelle/HOL. 

The appearance of formulas is standard, e.g. ‘ — *■’ is the (right-associative) 
infix implication symbol. Predicates are functions with Boolean result. Function 
application is written in curried style. For descriptions we apply Hilbert’s choice 
operator e, where ex. P a; denotes some value a; satisfying P, or an arbitrary value 
if no such x exists. 

Logical constants are declared by giving their name and type, separated by 
Primitive recursive function definitions are written as usual. Non-recursive 

• • • • d6f 

definitions are written with ‘ = ’. 

Types follow the syntax of ML, except that the function arrow is ‘=>’. Type 
abbreviations are introduced simply as equations. A free datatype is defined by 
listing its constructors together with their argument types, separated by ‘|’. 



122 



David von Oheimb and Tobias Nipkow 



There are the basic types bool and int, as well as the polymorphic type (a) set 
of homogeneous sets for any type a. Occasionally we apply the infix ‘image’ 
operator lifting a function over a set, defined as f“S {y. 3 xGS. y = f x}. 

The product type a x /3 comes with the projection functions fst and snd. 
Tuples are pairs nested to the right, e.g. (a,b,c) = (a,(b,c)). 

As the list type {a) list is defined via its constructors [] denoting the empty 
list and the infix ‘cons’ operator it can be introduced by the datatype 
declaration 

(a) list = [] I a4/^{a)list 

The concatenation operator on lists is written as the infix symbol ‘O’. There 
is a functional map :: (a /3) (a) list ^ (P)list applying a function to all 

elements of a list, as well as a conversion function set :: (a) list => (a) set. 

We frequently use the datatype 

(a) option = None | Some a 

It has an unpacking function the :: {a)option a such that the (Some x) = x 
and the None = arbitrary, where arbitrary is an unknown value defined as ex. False. 
There is a simple function mapping o2s :: {a)option => {a)set converting an op- 
tional value to a set, with the characteristic equations o2s (Some x) = {x} and 
o2s None = {}. 

Most of the HOT text shown in this article is directly taken from the input 
files. However, it has been massaged by hand to hide Isabelle idiosyncrasies, in- 
crease readability, and adapt the layout. Minor typos may have been introduced 
in the process. 

We adopt the following typographic conventions: Java keywords like catch 
appear in typewriter font, the names of logical constants like cfield appear in 
sans serif, while type names like state and met a- variables like v appear in italics. 



4 The Formalization of Bali 

This section presents all aspects of our formalization of Bal| 

As far as Bali is a subset of Java, it strictly adheres to the Java language 
specification with several generalizations: 

— we allow the result type of a method overriding another method to widen to 
the result type of the other method instead of requiring it to be identical. 

— if a class or an interface inherits more than one method with the same 
signature, the methods need not have identical return types. 

— no check of result types in dynamic method lookup. 

— the type of an assignment is determined by the right-hand side, which can 
be more specific than the left-hand side. 

^ The Isabelle sources are available from the Bali project page 



ifCD : / /WWW. in. rum. ae/ isaoeiie/ oaii 
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We found several issues concerning exceptions not specified in and 

therefore define a reasonable behavior that seems to be consistent with current 
implementations: 

— given a Null reference, the throw statement raises a NullPointer exception. 

— each system exception thrown yields a fresh exception object. 

— if there is not enough memory even to allocate an Out Of Memory error, pro- 
gram execution simply halts. (Our experiments showed erratic behavior of 
some implementation in this case, ranging from sudden termination without 
executing finally blocks, over hangup, to repeated invocation of a single 
exception handler!) 

To illustrate our approach, we use the following (artificial) example. 

class Base { 
boolean vee; 

Base foo(Base x) { 
return x; 

} 

} 

class Ext extends Base{ 
int vee; 

Ext f 00 (Base x) { 

( (Ext)x) . vee=l ; 
return null ; 

} 



Base e; 

e=new Ext () ; 

try {e . f 00 (null) ; } 

catch (NullPointerException x) {throw x;} 

This program fragment consists of two simple but complete class declarations 
and a block of statements that might occur in any method that has access to 
these declarations. It contains the following features of Bali: 

— class declarations with inheritance, hiding of fields, and overriding of meth- 
ods (with refined result type), 

— return expressions, parameter access, 

— sequential composition, expression statements, field assignment, type cast, 
local accesses, literal values, exception propagation, 

~ local assignment, instance creation, 

— try & catch statement, method call (with dynamic binding), throw statement 
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4.1 Abstract Syntax 

First, we describe how we represent the syntax of Bali and which abstractions 
we have introduced thereby. 



Programs. A Bali program is a pair of lists of interface and class declarations: 
prog = [idecr)listx (cdecl)list 

Throughout the article, the symbol ‘F’ denotes a Bali program, as we use 
programs as part of the static type context defined in 

Each declaration is a pair of a name and the defined entity. Some names, like 
those of predefined classes (including those of system exceptions xname), have 
a predefined meaning and are therefore handled extra. We do not specify the 
structure of names further, but use the opaque HOL types tnameO, mname, and 
enameO for user-defined type names, method names, and “expression names” 
(i.e. field and variable identifiers). 



xname = Throwable 

I NullPointer | DutOfMemory | ClassCast 
I NegArrSize | IndOutBound | ArrStore 



tname = Object 

I SXcpt xname 
I TName tnameO 



name of the top of the class hierarchy 
name of a system exception 
other class or interface name 



ename = this 

I EName enameO 



special name for this pointer 
expression name 



An interface {iface) contains lists of superinterface names and method decla- 
rations. A class specifies the names of an optional superclass and implemented 
interfaces, and lists of field and method declarations. 



iface = (tname)list x (sig x mhead)list 
ideal = tname x iface 

class = {tname) option x {tname)list x {fdecr)listx {mdecl)list 
cdecl = tname x class 



A field declaration {fdect) simply gives the field type {ty, see ^3. A method 
declaration {sig x mhead for interfaces or mdecl for classes) consists of a “sig- 
nature” ^^^3 8.4.2] (i.e. the method name and the list of parameter types, 
excluding the result type) followed by mhead, consisting of the list of param- 
eter names and the result type, and (if it appears within a class) the method 
body (mbody). The latter consists of the list of local variab les, a s tatement stmt 
as body, and a return expression expr (see below). As in the separate 

return expression saves us from dealing with return statements occurring in ar- 
bitrary positions within the method body. Such statements may be replaced by 
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assignments to a suitable result variable followed by a control transfer to the end 
of the method body, using the result variable as return expression. We provide a 
dummy result type and value for void methods. For simplicity, up to now each 
method has exactly one parameter; multiple parameters can be simulated by a 
single parameter object with multiple fields. 



field 


= ty 


fdecl 


= ename x field 


sig 


= mname x ty 


mhead 


= ename x ty 


Ivar 


= ename x ty 


mbody 


= {Ivar) list x stmt x expr 


methd 


= mhead x mbody 


mdecl 


= sig X methd 



field type 

method name and parameter type 
parameter name and result type 
local variable name and type 
local vars, block, and return expression 
method (of a class) 



In the abstract syntax given above, the formalization of our example program 
fragment looks like this: 

BaseC {Base, (Some Object, 

D: 

[(wee, PrimT boolean)], 

]((/oo,Class Base) ,{x, Class Base), ([],Skip,a;))])) 

ExtC {Ext, (Some Base, 

[], 

[(wee, PrimT int)], 

[((/oo,Class Base) ,{x, Class Ext), ([], 

Expr({ClassT i?a;t}(Class Ext)x.vee:=L\t (Intg 1)), 

Lit Null))])) 

classes [ObjectC, 

SXcptC Throwable, 

SXcptC NullPointer, SXcptC Out Of Memory, SXcptC ClassCast, 
SXcptC NegArrSize, SXcptC IndOutBound, SXcptC ArrStore, 

BaseC, ExtC\ 

tprg ([], crosses) 

def 

test = Expr{e:=ne-w Ext) ; 

try Expr(e./oo({Class Base}L\t Null)) 
catch((SXcpt NullPointer) x) (throw x) 

where Base stands for TName Base_, Ext for TName Ext_, and similarly for 
wee, X, and e. The constants Base_, Ext_, etc. are all distinct. The sequence of 
statements test could have been embedded in tprg, which we have left out for 
simplicity. 
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Representation of Lookup Tables. The representation of declarations as 
lists gives an implicit finiteness constraint, which turns out to be necessary for 
the well-foundedness of the subclass and subinterface relation. The list repre- 
sentation also enables an explicit check whether the declared entities are named 
uniquely, implemented with the function unique given below. This function does 
not check for duplicate definitions, which is harmless. 

unique :: {a x f})list bool 

def 

unique t = y{xi,yi)Gset t. y{x 2 ,y 2 )&set t. xi = X 2 — > yi = y 2 

For the lookup of declared entities, we transform declaration lists into ab- 
stract tables. They are realized in HOL as “partial” functions mapping names 
to values: 

{a, P) table = a {(3) option 

The empty table, pointwise update, extension of one table by another, the func- 
tion converting a declaration list into a table, and an auxiliary predicate relating 
entries of two tables, are defined easily: 

empty :: {a,(3)table 

:: {a,P)table a /3 {a,P)table 

_0_ :: {a,/3)table {a,(3)table {a,0)table 

table_of :: (axP)list {a,(3)table 

_ hiding _ 

entails _ :: {a,P)table {a,^)table (/3 7 boot) bool 

empty Xk. None 

t[x)-^y\ Xk. k = X then Some y else t k 

s (B t Xk. case t k of None s fc | Some x Some x 

table_of [] = empty 

table_of l{k,x)i^t) = (table_of t)[h-^x] 

dcf 

t hiding s entails R — \/k x y. t k = Some x — > s k = Some y — > R x y 
For the union of tables, we also need the type of non-unique tables, 
{a,/3)tables = a {(3)set 

together with a union operator and straightforward variants of two of the notions 
defined above: 

_© 0 _ :: {a,P)tables {a,0)tables {a,/3)tables 

Un_tables :: {{a, P) tables) set {a,P)tables 

_ hidings _ 

entails _ :: {a,P)tables {a,^)tables (/3 7 boot) ^ bool 
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Un_tables ts = \k. [JtGts. t k 

s ©© t \k. if i fc = {} then s k else t k 

d 0"f 

t hidings s entails R — Vfc. \/xGt k. \/yGs k. R x y 

A simple application of type table is the translation of programs to tables 
indexed by interface and class names: 

iface :: prog (tname, iface)table table_of o fst 

class :: prog {tname, class)table table_of o snd 



More interesting are the following functions that traverse the type hierarchy 
of a program, collecting the methods and fields into a table (the types tname 
and ref_ty are defined in 

imethds :: prog => tname {sig, TefAy x mhead) tables 

cmethd :: prog tname {sig, ref Ay x methd) table 

fields :: prog tname ^ {{ename x refAy) x field )list 



Note that imethds collects a non-unique table of method declarations allowing 
for inheritance of more than one method with the same signature. 

As Syme 



points out, a naive recursive definition of these functions is 
not possible in HOL because the class hierarchy might be cyclic, which is ruled 
out for well-formed programs (see only. This leads to partial functions, 
which HOL does not support directly. Syme defines these functions as relations 
instead. In contrast, we have chosen to define them as proper functions, based on 
Slind’s work on well-founded recursion We do not give their definitions, 

but only the recursion equations, which we derive as easy consequences: 



wLprog r A iface F I = Some {is, ms) — > 

imethds F I = Un_tables ((AJ. imethds F J) “set is) ©© 

(o2s o table_of (map {X{s,mh). (s,lfaceT I,mh)) ms)) 



wf_prog F A class F C = Some {sc, si, fs, ms) — > 

cmethd F C = (case sc of None ^ empty | Some D ^ cmethd F D) (B 
table_of (map (A(s,m). (s,(ClassT C,m))) ms) 

wLprog F A class F C = Some {sc,si,fs,ms) — > 
fields F C = map {X{fn,ff). ((/n,ClassT C),ft)) fs @ 

(case sc of None [] | Some D fields F D) 

The structure of the three equations is the same: the tables are constructed 
recursively from the corresponding tables of the superinterfaces or the superclass 
(if any), which models inheritance, augmented — with overriding — by the newly 
declared items. All declared items receive an extra label, namely their defining 
interface or class. 
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In our example, we obtain 

fields tprg Base = [{{vee, ClassT Base), PrimT boolean)] 
fields tprg Ext = [{{vee, ClassT Ext ), PrimT int), 

{{vee, ClassT Base), PrimT boolean)] 
cmethd tprg Base = empty[(/oo, Class Base)^ 

(ClassT Base, {x. Class Base), ([], Skip, z))] 
cmethd tprg Ext = empty[(/oo, Class Base)^ 

(ClassT Ext , {x. Class Ext ), ([], 

Expr({ClassT i?a;t}(Class Ext)x.vee\=\J\t (Intg 1)), 
Lit Null))] 



Terms. We define statements (appearing in method bodies), expressions (ap- 
pearing in statements), and values (appearing in expressions) as recursive data- 
types. 

Statements are reduced to their bare essentials. We do not formalize syntactic 
variants of conditionals and loops. Neither do we consider jumps like the break 
statement. 

For a more modular description, we divide the try _ catch _ finally _ 
statement into a try _ catch _ statement and a _ finally _ statement, which 
might be used in any context. Our version of the try _ catch _ statement has 
exactly one catch clause. Multiple catch clauses can be simulated with cascaded 
if _ else _ statements applying the _ instanceof _ operator. 

stmt = Skip 

j Expr expr 
\ stmt; stmt 

\ if {expr) stmt else stmt 
\ ■wh.±le{expr) stmt 
\ throw(eajpr) 

j try stmt catch(fname ename) stmt 
\ stmt finally stmt 

Skip denotes the empty statement. The “expression statement” Expr is a con- 
version from expressions to statements causing evaluation for side effects only. 
Assignments and method calls, which are expressions because they yield a value, 
can be turned into statements this way. In contrast to Java, for simplicity we 
allow this conversion to be applied to any kind of expression. 

Concerning expressions, our formalization leaves out the standard unary and 
binary operators as their typing and semantics is straightforward. The this 
expression is modeled as a special non-assignable local variable named this. 
The super construct can be simulated with a this expression that is cast to 
the superclass of the current class. Creation of multi-dimensional arrays can 
be simulated with nested array creation, while access and assignment to multi- 
dimensional arrays is nested anyway. 
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It might be reasonable to introduce the general notion of variables (i.e. left- 
hand sides of assignments) in order to factor out common behavior of local 
variables, class instance variables, and array components. But we have chosen 
not to do so because the semantic treatment of these three variants of variables 
differs considerably. This decision leads to some redundancy between access and 
assignment, especially in the semantics for arrays. 



expr = new tname 
I new ty[expr] 

I {tyjexpr 

I expr instanceof ref_ty 
I Lit val 
I ename 
I ename := expr 
I {ref _ty} expr . ename 
I { re/_ ty} expr . ename : = expr 
I expr[expr] 

I expr[expr]:=expr 
I expr . mname{{ty} expr) 



class instance creation 
array creation 
type cast 

type comparison operator 
literal 

local/parameter access 
local/parameter assignment 
field access 
field assignment 
array access 
array assignment 
method call 



The terms in braces { . . . } above are called type annotations. Strictly speaking, 
they are not part of the input language but serve as auxiliary information (com- 
puted by the type checker) that is crucial for the static binding of fields and 
the resolution of method overloading. Distinguishing between the actual input 
language and the augmented language would lead to a considerable amount of 
redundancy. We avoid this by assuming that the annotations are added before- 
hand by a kind of preprocessor. The correctness of the annotations is checked 
by the typing rules (see ^3- 



The definition of values is straightforward. It relies on the standard HOT 
types of Boolean values {hoot} and integers (m<), whereas the type loc of locations, 
i.e. abstract non-null addresses of objects, is not further specified. 

val = Unit dummy result of void methods 

I Bool bool 
I Intg int 
I Null 
I Addr loc 

The definitions below give some simple destructor functions for val with their 
characteristic properties. 

the_Bool :: val => bool 
the_lntg :: val ^ int 
the_Addr :: val => loc 



the_Bool (Bool b) = b 
the_lntg (Intg i) = i 
the_Addr (Addr a) = a 
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4.2 Type System 

This section defines types, various ordering relations between types, and the 
typing rules for statements and expressions. 

Types. We formalize Bali types as values of datatype ty, dividing them into 
primitive and reference types: 

prim_ty = void 

I boolean 
I int 

ref_ty = NullT 

I IfaceT tname 

I ClassT tname 

I ArrayT ty 

ty = PrimT prim_ty 

I RefT ref_ty 

void is used as a dummy type for methods without result. In the sequel NT 
stands for RefT NullT, Iface I for RefT(lfaceT I), Class C for RefT(ClassT C), 
and T[] for RefT(ArrayT T). 

An interface or class type is considered as a proper type only if there is a 
corresponding declaration for its type name in the current program, which is 
checked by the following predicates: 

isJface :: prog => tname bool 

is_class :: prog tname bool 

is_type :: prog ^ ty bool 

dcf 

isJface r tn = iface F tn ^ None 

dc'f 

is_class r tn = class F tn ^ None 
is_type F (PrimT _) = True 

is_type F NT = True 

is_type F (Iface i) = isJface F I 

is_type F (Class C) = is_class F C 

is_type F (T[]) = is_type F T 

For all types, a default value is defined via 

default_val :: ty ^ val 
default_val (PrimT void ) = Unit 
default_val (PrimT boolean) = Bool False 
default_val (PrimT int ) = Intg 0 

default_val (RefT r ) = Null 
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Type Relations. The relations between types depend on the interface and class 
hierarchy of a given program F, and are therefore expressed with reference to F. 
The direct subinterface (_ b _ _), subclass (_ h _ _), and implementation 

(_ h _ _) relations are of type prog x tname x tname bool and are defined 

as follows: 

F\- I^l J is_iface F I A is_iface F J A J G set (fst(the(iface F I))) 
F\-~ Ca\D is_class F C A is_class FDA Some D — fst(the(class F C}) 

F\- I is_class F C A isjface FI A I G set (fst(snd(the(class F C}))) 

The transitive (but not reflexive) closures and _ h _ - can be 

defined inductively: 

FGIaI K FG lAi J; FG jAiK 
FGIa-,K FGIa-,K 

FGCaIE FG CAcD; FG DAcE 
EG CAcE eg C Ac E 

There is also a kind of transitive closure of _ h _ _ defined as 

F h J FG I; FGIA;J FG CaI D; F G J 
FGC"^J FGC'^J FGC"^J 

The key relation is widening: FG SA T, where S and T are of type ty, means 
that S' is a syntactic subtype of T, i.e. in any expression context (especially 
assignments and method invocations) expecting a value of type T, a value of 
type S may occur. Note that this does not necessarily mean that type S behaves 
like type T, but only that it has a syntactically compatible set of fields and 
methods. The widening relation is defined inductively as given below. Note that 
some rules carry the additional premise that Object is a proper class, which will 
be ensured for well-formed programs. 

is_type F T is_type F (RefT R) 

FG TAT FGNT A RefT R 
FG lAi J is_iface F I; is_class F Object 

FG Iface I A Iface J FG Iface I A Class Object 
FGCAcD FGC'^I 

F G Class C A Class D FG Class C A Iface I 
FG RefT S A RefT T is_type F T; is_class F Object 

Th(RefT 5)[] ^ (RefT T)[] FGT[] A Class Object 

To allow for type casting we also have the casting relation, where F G SA-f T 
means that a value of type S may be cast to type T: 

FGSAT FG CAc D is.class F C; isJface F I 

F G SA-f T FG Class D A-? Class C FG Class C A^ Iface I 

T h RefT S ^7 RefT T is_class F Object; is_type F T 

TI-(RefT 5)[] ^7 (RefT T)[] T h Class Object T[] 
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is_iface r J; h J; 
imethds F I hidings imethds F J entails 

{\{mi,{pni,rTi)) {m 2 ,{pn 2 ,rT 2 )) ■ F\~ rTi<rT 2 ) isjface F I; is_class F C 
F h Iface I ^7 Iface J F\- Iface I Class C 



Typing Rules. Now we come to type-checking itself, which is expressed as a 
set of constraints on the types of expressions, relative to a type environment. 

An environment consists of a global part, namely a program F, and a local 
part (written ‘A’), namely the types of the local variables including the current 
class, i.e. the type of this: 

lenv = {ename, ty) table 
env = prog x lenv 



prg :: env ^ prog = X{F,A). F 
Id :: env ^ lenv= \{F,A). A 

The well-typedness of statements and the typing of expressions are defined 
inductively relative to an environment. The typing of expressions is unique, as 
can be shown easily by rule induction. 

_ h_::0 :: env stmt bool 

_ h :: env expr ^ ty ^ bool 

The type-checking rules for most statements are standard: 

E\- e'.'.T i?hci::0; E\~ C2-'.0 

i?hSkip::0 i?hExpre::0 E\- ci\ C 2 -'.<> 

Ah e::PrimT boolean; Ahci::0; Ahc2::0 
Ah if (e) Cl else C2::0 

Ah e::PrimT boolean; Ahc::0 Ahci::0; Ahc2::0 
Ahwhile(e) c::0 Ah ci finally C 2 

Note the use of the widening relation in the following two rules to ensure 
that a value thrown or caught as an exception is indeed a exception object. 

Ah e::Class tn; prg Ah Class Class (SXcpt Throwable) 

Ah throw e:: O 

(A, A) h ci:: O ; Ah Class tn< Class (SXcpt Throwable); 

A vn = None; (A,A[i;ni-^-Class tn]) h C2::0 

(A, A) h try ci catch(tn vn) C2::0 

The try _ catch _ statement is the only one that involves a change of the type 
environment, namely to include typing information for the exception parameter. 
The name of this parameter is required to be new in the local environment. 
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The typing rules for the first few of the expressions are straightforward, 
except for the confusing direction of the casting relation in the type cast rule: 

is_class (prg E) C is_type (prg E) T; E\- i:\PnmJ int 
ifhnew (7::Class C EPnev T'[z]::T[] 

E\- e::T; prg E\~ T^7 T' typeof (Ao. None) x = Some T 
EP {T')e:-.T' EV- Lit t.:T 

i?|- e::RefT T; prg ifh RefT T^?RefT T' 

E\- e instanceof T'wPnmX boolean 

The rule for Lit prohibits addresses as literal values, which is implemented by 
supplying Xa. None as the “dynamic type” argument in the call of the function 

typeof :: {loc ^ ty option) val ty option 
typeof dt Unit = Some (PrimT void) 
typeof dt (Bool h) = Some (PrimT boolean) 
typeof dt (Intg i) — Some (PrimT int) 
typeof dt Null = Some (RefT NullT) 
typeof dt (Addr a) = dt a 

This function is reused below with a more interesting value for the parameter 
dt, namely a function to compute the dynamic type of a reference. 

The typings of all three assignment variants are quite similar, except that for 
local variables additionally an assignment to this is forbidden. In any case, as a 
generalization to the Java specification, the type of the assignment is determined 
by the right-hand (as opposed to the left-hand) side. 

Id E vn = Some T; is_type (prg E) T 
EP vn:: T 

EP vn::T; EP v::T' ; prg EP T' ^ T; vn yf this 
EP vn:=v:: T' 

EP e::Class C; cfield (prg E) C fn = Some (fdJT) 
EP{fd}e.fn::fT 

EP{fd}e.fn::T; EPv::T'; prgEPT'^T 
EP {fd}e.fn:=v::T' 

EP a::T\\; EP i::PP\mP int 
EP o[z]::T 

EP a[i\::T; EP v::T' ; prgEPT'^T 
EP a[i\:=v::T' 

if h e:: RefT T; EP p::pT; 

max_spec (prg E) T {mn,pT) = {{{md,{pn,rT)) ,pT')} 

EP e . mn{{pT'}p)::rT 
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The function cfield :: prog tname {ename , ref_ty x field)table , defined as 

Hpf 

cfield r C = table_of ((map (X{{n,d) ,t). {n,{d,t)))) (fields F C)), is a variant of 
fields. It implements a field lookup that is based on the field name alone in 
contrast to a combination of field name and defining class. Thus in the above 
typing rule for field access, equal field names hide each other, while at run-time 
all fields are accessible, using the defining class as an additional search key. 

The type annotations { . . . } in the above rules for field access and method call 
are used to implement static binding for fields and to resolve overloaded method 
names statically. Technically speaking, the typing rules serve as constraints on 
these annotations during type-checking, but one can also think of the annotations 
being filled with schematic variables that are instantiated with their correct 
values in the type-checking process, as is demonstrated in the example overleaf. 
The value of each annotation is uniquely determined by the value of a function 
in the premise of the field access and method call rule: 

A field access {fd}e.fn is annotated with the defining class of the field found 
when searching the class hierarchy for the name fn (using cfield), starting from 
the static type Class C of e. The annotation {/d} will be used at run-time to 
access the field via the pair (fn,fd) . 

A method call e.mn{{pT'}p) is type-correct only if the function max_spec 
determining the set of “maximally specific” 15.11.2] methods for refer- 

ence type T (as defined below) yields exactly one method entry. In this case, 
the method call is annotated by pT' , which is the argument type of the most 
specific method mn applicable according to the static types T of e and pT of p. 
Thus any static overloading of the method name mn has been resolved and the 
dynamic method lookup at run-time will be based on the signature [mn,pT'). 



max_spec 

appLmethds 

mheads 

more_spec 



prog ^ ref_ty ^ sig 
prog Tef_ty => sig 
prog => ref_ty sig 
prog =k {ref_ty x mhead) x ty 



>{{ref_ty X mhead) x ty) set 

>{{ref_ty x mhead) x ty) set 

- {ref_tyx mhead) set 

■ {ref_ty x mhead) x ty ^ bool 



def 

max_spec F T sig = {m \ m SappLmethds F T sig A 

(Vm'GappLmethds F T sig. 

more_spec F m' m — > m' = m)} 

appLmethds F T {mn, pT) {{m,pT') \ m G mheads F T {mn, pT') A 

FApTApT') 

mheads F NullT = Xsig. {} 
mheads F (IfaceT i) = imethds F I 

mheads F (ClassT C) = o2s o option_map {X{d,{h,b)).{d,h)) o cmethd F C 
mheads F (ArrayT T) = Xsig. {} 

rlpf 

more_spec T {{md,mh),pT) {{md! ,mh!),pT') = T L RefT mdARefT md! A 

FApTApT' 

where 

option_map :: (a /3) => (a option (3 option) 

option_map / Xy. case y of None ^ None | Some x =A Some {f x) 



The well-typedness of our example code test is derived as given below. For formatting reasons, the derivation tree is cut at several 
positions, whereby the positions are marked with the labels of the cut subtrees. Irrelevant values in formulas are replaced by _ . 
We use the following abbreviations: 
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" " a, 
fei 

t-H ^ 



Cl Cl 

!', !', 

r *' 0) 0) 



5 : 5 : 

fe; fei 

Co Co 



^ CU, 



{Call) Th Class SNP:<C\ass (SXcpt Throwable) A x = None {Th 
{LAss) (T,/l) h try Expr(e./oo({ ^pT'lLit Null)) catch(S'A^F x) (throws):: 

(T,/l) h Expr(e; =new Fxf) ; try Expr(e./oo({ ?pT’'}Lit Null)) caXch{SNP x) (throws):: 
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4.3 Well-Formedness 

A program must satisfy a number of well-formedness conditions concerning 
global properties of all declarations. The conditions are expressed as predicates 
on field, method, interface, class, and whole program declarations. 



wLfdecI : 


: prog 


fdecl bool 


wLmhead : 


: prog sig 


X mhead =A bool 


wLmdecI : 


: prog tname mdecl =A bool 


wLidecI : 


: prog 


ideal bool 


wLcdecI : 


: prog 


cdecl bool 


wLprog : 


: prog 


bool 



A field declaration is well-formed iff its type exists: 
wLfdecI r is_type F ft 

A method declaration is well-formed only if its argument and result types 
are defined and the name of the parameter is not this. Additionally, if the 
declaration appears in a class, the names of the local variables must be unique 
and may not contain the special name this nor hide the parameter, all types 
of the local variables must exist, the method body has to be well-typed (in 
the static context of its parameter type and the current class), and its result 
expression must have a type that widens to the result type: 

wLmhead F {{mn,pT) ,{pn,rT)) is_type F pT A is_type F rT A pn ^ this 

wLmdecI F C {{mn,pT) ,{pn,rT) ,lvars,blk,res) 

let Uab — table_of Ivars; E = (T,Zta&[thisi— >Class C\[pny^pT]) 
in wLmhead F {{mn,pT) ,{pn,rT)) A 

unique Ivars A Itab this = None A Uab pn = None A 
(V(wn, T)Sset Ivars. is_type F T) A 
EAblky.O A3T. EAres::TAFA TArT 

Even more complex conditions are required for well- formed interface and class 
declarations. The name of a well-formed interface declaration is not a class name. 
All superinterfaces exist and are not subinterfaces at the same time. All methods 
newly declared in the interface are named uniquely and are well-formed. Further- 
more, any method overriding a set of methods defined in some superinterfaces 
has a result type that widens to all their result types: 

wLidecI F {I, {is, ms)) ^ is_class F I A 

(VJGset is. isJface F J A ^ F\- JA\ F) A 
unique ms A (Vm€set ms. wLmhead F mA 
let mtab = Un_tables ((AJ. imethds F J) “set is) in 
(o2s o table_of ms) hidings mtab entails 
{\{pn,rT) {m,{pri! ,rT')) . F A rTArT') 
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Similarly, the name of a well-formed class declaration is not an interface 
name. All implemented interfaces exist, and for any method of such an interface, 
the class provides an implementing method with a possibly narrower return type. 
All fields and methods newly declared in the class are named uniquely and are 
well-formed. If the class is not Object, it refers to an existing superclass, which 
is not a subclass of the current class. Furthermore, any method overriding a 
method of the superclass has a compatible result type: 

wLcdecI r {C,{sc,si,fs,ms}) ^ isjface F C A 
(V/Gset si. isjface F I A 

Vs. y{mi,{pni,rTi)) G imethds FIs. 

3 m 2 pu2 rT2 b. cmethd F C s = Some {m2 ,{pn2 ,rT2) ,b) A 
F h rT 2 d: i"Ti) A 

unique /s A (V/ G set fs. wLfdecI F f) A 

unique ms A (VmGset ms. wLmdecI F C m) A 
(case sc of None => C = Object 

I Some D is_class F DA^ FAD A^CA 

table_of ms hiding cmethd F D entails 
{\{{pni,rTi),b) {m,{{pn2,rT2),b')). FA rTidrT2) 

Finally, all interfaces and classes declared in a well-formed program are 
named uniquely and are in turn well-formed. For uniformity, this includes the 
predefined class declarations of Object and the (flat) hierarchy of system excep- 
tions. 

ObjectC (Object , (None , [], [], [])) 

def 

SXcptC xn = let sc = if a;n=Throwable then Object else SXcpt Throwable in 
(SXcpt xn, (Some sc, [], [], [])) 



wLprog F = let is = set (fst F); cs = set (snd F) 

in ObjectC G cs A dxn. SXcptC xn G cs A 
unique (fst F) AdiGis. wLidecI F i) A 
unique (snd F) A VcGcs. wLcdecI F c) 

Our example program tprg is well- formed. Here is a heavily abstracted deriva- 
tion tree of our proof of this fact. 

wLmdecI tprg Base {{foo. Class Base), 

{x. Class Base), [], Skip, x) ^{tprgA Oh ject Ac Base) 

wLcdecI tprg BaseC 

wLmdecI tprg Ext {{foo. Class Base), 

{x. Class Ext), [], Expr ({ClassT i?a;t}(Class Ext) 

a;.tee:=Lit (Intg 1)), Lit Null) -^{tprgABaseAcExt) 

wLcdecI tprg BaseC 

wLcdecI tprg BaseC wLcdecI tprg ExtC Base yf Ext 
wFtprg tprg 
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4.4 Operational Semantics 

We formalize the semantics of Java in operational style with evaluation rules. 
This is the natural choice since the language specification itself is given in an 
operational evaluation-oriented style, which allows for a direct formalization and 
its straightforward validation. Furthermore, a denotational semantics would re- 
quire much more difficult mathematical tools, and an axiomatic semantics would 
be problematic to validate and to use for reasoning on the language as a whole. 
We prefer an evaluation semantics to a transition semantics in order to obtain 
a concise description, because we consider a transition semantics less readable 
and rather low-level, which in particular holds for a formulation as an Abstract 
State Machine (ASM) like in 

In this section, we describe the notions of a state and its components and 
give the evaluation rules for statements and expressions. 

State. A state consists of an optional exception (of type xcpt), a heap, and a 
current invocation frame, which is the values of the local variables (including 
method and exception parameters and the this pointer): 
state = (xcpt) option x st 
st = heap X locals 

def 

heap :: st ^ heap — \{h,t). h 

def 

locals :: st ^ locals = X{h,t). I 

Remember that tuples associative to the right, so if for some state a we have an 
equation like a — (x, a'), then x is the (optional) exception component alone, 
while the second projection a' of the state has (tuple) type st, i.e. represents a 
“small” state excluding the exception entry. 

An exception is a reference to an instance of some exception class, which is a 
subclass of Throwable. Normally, when an exception is thrown, a fresh exception 
object is allocated and its location returned to represent the exception. But in the 
case of system exceptions, we defer their allocation (and just record their names) 
until an enclosing catch block references it. This helps to avoid the subtleties 
of (conditional) side effects on the heap and out-of-memory conditions. Thus we 
model exceptions as follows. 

xcpt = XcptLoc loc 

I SysXcpt xname 

A heap maps locations to objects, while local variables map names to values: 

heap = {loc , obj) table 
locals = {ename, val) table 

In our model there is no need to explicitly maintain a stack of invocation frames 
containing local variables and return addresses for method calls. In this way we 
also abstract over the finiteness of stack space. On the other hand, we explicitly 
model the possibility of memory allocation on the heap to fail if there is no free 
location (i.e. some a with (heap a) a = None) available. Memory allocation is 
loosely, yet deterministically, defined by the function 
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new_Addr :: heap ^ {loc x (xcpt) option) option 

def 

new_Addr h = ey. { y = None A (Va. h a ^ None)) V 
(3a X. y = Some {a,x) A h a = None A 

{x = None \J x = Some (SysXcpt Out Of Memory))) 

This function fails, i.e. returns None, iff there is no free location on the heap, 
and otherwise gives an unused location. At the latest when there is only one free 
address left, it returns an DutOfMemory exception. In this way it is guaranteed 
that when an DutOfMemory exception is thrown for the first time, there is a 
free location on the heap to allocate it. Note that we do not consider garbage 
collection. 

An object is either a class instance, modeled as a pair of its class name and a 
table mapping pairs of a field name and the defining class to values, or an array, 
modeled as a pair of its component type and a table mapping integers to values: 

jields = {ename x ref_ty, val) table 
components = {int , val) table 

obj = Obj tname fields 

I Arr ty components 

the_Obj :: (obj) option tname x fields 

the_Arr :: (obj) option ^ ty x components 
obj_ty :: obj ty 



the_Obj (Some (Obj C fs)) = {CJs) 
the_Arr (Some (Arr T cs)) = {T,cs) 
obJ_ty (Obj C fs) = Class C 

obJ_ty (Arr T cs) = T[] 

Using obJ_ty we define the predicate F,a h v fits T, meaning that in the con- 
text of r and state cr, the value v is assignable to a variable of type T. This 
proposition, which is computed at run-time for type casts and array assignments, 
is a weaker version of the notion of conformance introduced in 
_ h _ fits _ :: prog st ^ val ^ ty ^ bool 

rlpf 

r,a\- V fits T = {3pt. T = PrimT pt) \/ v = Null V 

T h obJ_ty (the (heap a (the_Addr a))) A T 



There is a number of auxiliary functions for constructing and updating the 
state, namely: 



lupd[_H->_] _ 
hupd[_:— >_] _ 
x_case 



:: ename val ^ st ^ st 

w loc => obj ^ st ^ st 

:: xcpt option ^ st ^ st ^ state 



lupd[r;H^a; ] {h,l) = {h,l[if-^x\) 
hupd[ai— s-o&j] {h,l) {h[at-^obj],l) 

X a' a {x, if a; = None a' else a) 



x_case 
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init_vars :: (a x ty)list ^ {a, variable 

init_Obj :: prog ^ tname ^ obj 

init_Arr :: ty int obj 

def 

init_vars = table_of o map {X{n,T). (n,default_val T)) 

init_Obj r C Obj C (init_vars (fields F C)) 

init_Arr T Arr T {Xj. if 0<j A j<i then Some (default_val T) 

else None) 

raise_if :: bool xname ^ {xcpt)option ^ {xcpt) option 

np :: val {xcpt)option ^ (xcpt) option 

raisejf c xn xo if c A {xo = None) then Some (SysXcpt xn) else xo 
np V raise_if (v = Null) NullPointer 

The definition of raise_if deserves a comment: raise_if c xn xo either propagates 
an already thrown exception xo or raises the system exception xn if c is true. 
As an application, np v checks for a null pointer access through the value v and 
throws a NullPointer exception in this case, but any other exception that has 
already occurred takes precedence. 



Evaluation Rule Format. Internally, the evaluation rules are given as mu- 
tually inductive sets of tuples. These sets define relations, which we present as 
predicates of the following form. 

— F h cr -c-^ a' :: prog state stmt state bool 

means that the execution of statement c transforms state a into aX 

— F\- a — el>?;— > a' :: prog state expr val state bool 
means that expression e evaluates to v, transforming a into a' . 

Although defined as relations (for technical reasons), the semantics given below 
can be shown to be functional, i.e. deterministic. 

Strictly speaking it is not necessary to include an exception in the start state 
of a computation. Similarly, an expression needs only return either a value or 
an exception, but not both. However, the symmetry achieved by our slightly 
redundant model simplifies the rules considerably. In particular, we can avoid 
case distinctions on whether exceptions occur in intermediate states, which would 
cause the rules to be split. Suppose for example that F\~a -c-^ a' had the 
signature prog ^ st ^ stmt state bool, i.e. all rules assume that there is 
no exception in the start state. Then the rule(s) for sequential composition would 
look like 

T h (Jo —Cl — > (None,(Ji); F\~ai -ci-^ (J2 
T h (To —Cl ; C2^ (J2 

T h (To —Cl — > (Some xs,a\) 

T h (To — ci; C 2 ^ (Some xs,(Ji) 
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As a consequence of the design decisions just mentioned, there is exactly one 
rule for each syntactic construct. Additionally there are general rules defining 
that exceptions simply propagate when a series of statements is executed or a 
series of expressions is evaluated: 



Ah (Some xc,a) -c-^ (Some xc,a) 

Ah (Some xc,a) — el>arbitrary-^ (Some xc,a) 

All other rules can assume that in their concerning initial state no exception has 
been thrown. For such states, we define the abbreviation Norm a, which stands 
for (None,(r). 

Execution of Statements. The rules for the statements not explicitly involv- 
ing exceptions are obvious: 

A h Norm (Tq —c\ — > a\; A h cti — C2^ CT2 
A h Norm a — Skip^ Norm a Ah Norm erg — ci ; C2^ (J2 

Ah Norm erg — Ou— > cti; 

A h Norm ag — Ou— > cti A h cti —if the_Bool v then a else C2^ CT2 
Ah Norm ag — Expr cti Ah Norm ag — if(e) ci else C2^ a2 
Ah Norm ag — if(e) (c; while(e) c) else Skip-^ cti 
A h Norm ag — while(e) cti 

If no other exceptions have occurred while evaluating its argument and test- 
ing for a null reference (using np), the throw statement copies the evaluated 
location into the exception component of the state: 

Ah Norm ag —el>a'^ (xi,ai); xi' = np a' xi; 
xi"=(if a;i'=None then (Some (XcptLoc (the_Addr a'))) else x/) 

Ah Norm ag —throw (xi",ai) 

For the semantics of the try _ catch _ statement we have to distinguish 
whether some exception is thrown and then caught by the catch clause or not. 
In the first case, i.e. there is an exception of appropriate dynamic type to be 
handled, the catch clause is executed with its exception parameter set to the 
caught exception. In the second case the catch clause is skipped. Because of 
technical limitations of the inductive definition package of Isabelle/HOL, even 
in this case we have to provide an occurrence of the execution relation, which in 
effect simply sets a2 to (xi',a/). 

Ah Norm ag — ci^ cti; A h cti — salloc^ (xi',a/); 
case xi' of None ^ cti" = (xi',ai') A C2' = Skip 

I Some xc let a = Addr (the_XcptLoc xc) in 
if A,(Ti' F a fits Class tn 
then ai" — Norm (lupd[TOi— >o](Ti') A C2' = C2 
else ai" = {x\ ,a\) A C2! = Skip; 

A h CTi" — C2'— > (72 

Ah Norm ag —(try ci catch(fn vn) 02)^ a2 
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On the one hand, the exception parameter of the catch clause must repre- 
sent the exception thrown in the try block by a reference to its exception ob- 
ject. As on the other hand we defer the allocation of system exceptions when 
evaluating expressions, we have to ensure that even for such exceptions a suit- 
able exception object is allocated on the heap of a\ , replacing the SysXcpt 
entry by an XcptLoc entry in x\ . This is achieved by the auxiliary relation 
r\~a — salloc-^ a' :: prog state ^ state ^ bool. If no system exception has 
been thrown, the relation behaves like the identity on the state, and otherwise 
allocates an exception object and modifies the state accordingly. Note that this 
allocation step is impossible — and therefore program execution halts — if there 
is no free address left. 



T h Norm a — salloc-^ Norm a 



T h (Some (XcptLoc a), a) — salloc^ (Some (XcptLoc a), a) 

new_Addr (heap a) — Some {a,x); 
xobj = init_Obj r (SXcpt (if x = None then xn else Out Of Memory)) 

Th(Some (SysXcpt xn),a) — salloc-^ (Some (XcptLoc a),hupd[ai-^-a;o&j](T) 

The finally statement is similar to the sequential composition, but executes 
its second clause regardless whether an exception has been thrown in its first 
clause or not. If an exception occurs in either clause, it is (re-)raised after the 
statement, and if both parts throw an exception, the first one takes precedence. 

ThNorm (Jq — ci ^ (xi,ai); 

ThNorm cti -C2 ^ {x2,(J2); 

X2 = (if xi yf None A 3:2 = None then xi else X2) 

r\- Norm (Jo — (ci finally 02)^ {x2 ,(J2) 

Evaluation of Expressions. In contrast to the statement rules, almost all 
evaluation rules for expressions deserve some comments. 

Creating a new class instance means picking a free address a and updating 
the heap at that address with an object, the fields of which are initialized with 
default values according to their types. Note that the rule is not applicable — 
and therefore execution halts — if new_Addr fails. 

new_Addr (heap a) — Some (a,x) 

rh Norm (T —new Cl>Addr o— > x_case x (hupd[oi— >init_Obj F C\a) a 

The same applies for the creation of a new array, where additionally an 
exception is raised if the length of the array is negative: 

r\- Norm (Jo — el>z'^ (xi,ai); i = the_lntg i! ; 
new_Addr (heap ai) = Some (a,x); 
xi' = raise_if (i< 0 ) NegArrSize (if xi = None then x else xi) 

F h Norm (Jq —new T[e]l>Addr a— > x_case xi' (hupd[ai— >init_Arr T i](Ji) (Ji 
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A type cast merely returns its argument value, but raises an exception if the 
dynamic type happens to be unsuitable: 

Ah Norm (Jq —e\>v^ (xi,ai); 
xi = raise_if(^ P,ai h v fits T) ClassCast xi 

Ah Norm (Jq —{T)e\>v^ {xi ,ai) 

The type comparison operator checks if the type of its argument is assignable 
to the given reference type: 

Ah Norm ao — Ou— > cti; 
b = (?;yfNull A A,snd cti h ?; fits RefT A) 

Ah Norm ao —e instanceof Al>Bool cti 
The result of a literal expression is simply the given value: 

Ah Norm a —Lit vt>v^ Norm a 

An access to a local variable (or the this pointer) reads from the local state 
component: 

Ah Norm a — unl>the (locals a vn)^ Norm a 

An assignment to a local variable updates the state, unless the evaluation of 
the subexpression raises an exception: 

Ah Norm ao —e\>v^ (x,ai); 
a\ = (if a; = None then lupd[?;n:-^-w] a\ else a\) 

Ah Norm ao —vn:=e]>v^ {x,a\) 

A field access reads from a field of the given object, taking into account 
the type annotation which yields the defining class of the field as determined 
statically. It also checks for null pointer access. 

Ah Norm ao — el>o'^ (xi,ai); 

V = the (snd (the_ObJ (heap cti (the_Addr a'))) {fn,T)) 

Ah Norm ao —{T}e.fn]>v—^ (np a! xi,a\) 

A field assignment acts accordingly: 

Ah Norm ao —ei\>a!^ {xi,a\); a = the_Addr a' ; 

Ah(np a! xi,a\) —e2>v^ {x2,(J2); 

(c,fs) = the_ObJ (heap a2 a); obj — ObJ c {fs[{fn,T) :=v]) 

Ah Norm ao —{{T}ei .fn:=e2)t>v^ x_case X2 (hupd[ai— >o&j]<J2) CT2 

An array access reads a component from the given array, but raises an ex- 
ception if the index is invalid: 

A h Norm ao —eil>a'^ ai; A h cti — e2l>i'^ (2:2, 0-2); 

VO = snd (the_Arr (heap a2 (the_Addr a'))) (the_lntg i'); 
x-2 = raise_if (vo = None) IndOutBound (np a' X2) 

Ah Norm ao — ei[e2]l>the vo^ {x2 ,a2) 
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Similarly, an array assignment updates the appropriate component, but first 
has to check the type of the value to be assigned. Note one subtle difference to 
field assignment: null pointer access is checked after evaluating the right-hand 
side, whereas in field assignment the check occurs immediately after calculating 
the reference. 



r\- Norm (To — eil>a'^ <Ji; a — the_Addr a'; 
rh ai ( 72 / i = the_lntg i' ; 

Th(T2 -63l>?;^ (2:3, 0-3); 

(T,cs) = the_Arr (heap a); obj = Arr T 

2:3' = raise_if ^^,(73 h v fits T) ArrStore ( 
raiseJf (cs i = None) IndOutBound (np a! x^)) 

ThNorm (Tq —( 61(62] 1=63) x_case x^' (hupd[(ji-^-o&j](73) (73 

The most complex rule is the one for method invocation: after evaluating 
6 to the target location a! and p to the parameter value pv, the block blk and 
the result expression res of method mn with argument type T are extracted 
from the program F (using the dynamic type dynT of the object stored at a!). 
For simplicity, we require local variables to be initialized with default values, 
as the expensive rules for “definite assignment” Ch. 16] merely enable 

the run-time optimization that variables need not be initialized before being 
explicitly assigned to. After executing blk and res in the new invocation frame 
built from the local variables, the parameter pv and a' as the value of this, the 
old invocation frame is restored and the result value v returned: 

r\- Norm (To — 6l>a'^ cri; 

Thcri -pt>pv^ {X 2 ,<J 2 ); 
dynT = fst (the_ObJ (heap (72 (the_Addr a'))); 
{md,{pn,rT),lvars,blk,res) — the (cmethd F dynT {mn,pT)); 

Th(np a! X2,(heap (J2,init_vars Zwars[thisi-^a'][prai-^p?;])) -blk-^ a^; 

T h (73 —res\>v (x4,(T4) 

ThNorm ao —{e.mn{{pT}p))\>v^ (x4,(heap (74, locals (72)) 

Note that all rules are defined carefully in order to be applicable even in not 
type-correct situations. For example, in any context where a value v is expected 
to be an address, we do not use a premise like v = Addr a as this will disable the 
rule if V happens to be, for example, a null pointer or a Boolean value. Instead, 
we use an expression like a = the_Addr v, which will yield an arbitrary value if 
V is not an address, yet will leave the rule applicable. In such cases we could 
not prove anything useful about a, but during the type soundness proof itself 
it emerges that for well-formed programs (and statically well-typed statements 
and expressions) such situations cannot occur. A “defensive” evaluation throw- 
ing some artificial exception in case of type mismatches, which would require 
additional overhead, is therefore not necessary. 
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5 The Proof of Type Soundness 

In this section we discuss our type soundness theorem together with its cru- 
cial lemmas. As we spent almost half of the proof effort deriving properties 
of the type relations and the structure of well- formed programs, we dedicate to 
them subsections of their own before introducing helpful notions concerning type 
soundness, the main theorem itself, and interesting corollaries. 

It is not surprising that many of them are similar to those given by Drosso- 
poulou and Eisenbach since the necessity of certain lemmas emerges quite 

naturally. On the other hand, the proof principles we use are sometimes rather 
different from those outlined in their earlier paper some of which were 

inadequate. 

5.1 Lemmas on the Type Relations 

There are two non-trivial lemmas concerning the type relations of Bali, namely 
the well-foundedness wf of the converse subinterface and subclass relations 

wf_prog r — > wf (A(j ,i ). r\- 1 J ) 

A wf (A(D,(7). Eh C^cD) 

and the frequently used transitivity of the widening relation: 
wf.prog TAThS'A[/AThf/AT — > 

The two relations are well-founded because they are finite and acyclic, where 
the former is a consequence of representing class and interface declarations as 
lists, and the latter follows from the irreflexivity of the relations, which in turn 
follows from the well-formedness of the classes and interfaces implied by the 
well-formedness of the whole program. 

The well-foundedness facts are necessary for deriving the recursion equations 
for the functions that traverse the type hierarchy of a program (see (^3 and 
also give rise to induction principles for the (direct) subinterface and subclass 
relations, e.g. the rule 

wLprog F; P Object; 

\/C D. Cyf Object A Th A ... APR — > P C 

VR. is_class P E — > P E 

means that for a well-formed program, if some property hold for class Object 
and is preserved by the direct subclass relation, it holds for all classes. 

Most lemmas, as well as auxiliary properties for deriving them, typically rely 
on several well-formedness conditions and are usually proved by rule induction 
on the type relation involved, or by applying the induction principles just men- 
tioned. For example, the transitivity of _ h _ A _ is proved by rule induction on 
the widening relation. It requires a well- formed program because it uses the 
properties that every class widens to Object and that Object has neither a 
superclass nor a superinterface. 
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5.2 Lemmas on Fields and Methods 

For the type-safety of field accesses and method calls, characteristic lemmas 
concerning the field lookup and method lookup are required. They are used to 
relate the (static) types of fields and methods, as determined at compile-time, 
to the actual (dynamic) types that occur at run-time. 

For example, fields correctly referred to at compile-time must be found at run- 
time. More formally, if a field access { Tje.fn, where e is of type Class C, statically 
refers to a field of type fT defined in the reference type T, then within an instance 
of some class C", which may be a subclass of C, the field can be (dynamically) 
referred to using the same name and its defining class. In particular, there is no 
dynamic binding for fields. This fact requires the following lemma: 

wLprog r A cfield F C fn = Some (T,/T) A T h Class C" A Class C — > 

table_of (fields F C') {fn,T) = Some fT 

Concerning method calls, a similar requirement preventing ‘method not un- 
derstood’ errors can be formalized: if a method call of the form e.mn{{pT}p) 
with Eh e::RefT T refers to a method that is statically available for the reference 
e, the dynamic lookup of the object pointed at by e should yield a method with 
a compatible result type. The lemma that helps to establish this behavior reads 
as follows: for a well-formed program, a reference type T, and any class type Ti 
that widens to T, if T (statically) supports a method with a given signature, 
then the (dynamic) type Ti supports a method with the same signature and 
whose result type widens to the result type of the first method: 

wLprog F A {mi,{pni,rTi)) G mheads F T sig A F \~ Class Ti A RefT T — > 
3 m 2 pu2 rT2 b. cmethd F Ti sig = Some {m2,{pn2,rT2),b) A TI-rT2^rT'i 

The proofs of these lemmas are lengthy and require many auxiliary theorems 
that are proved by induction on the direct subclass relation, by case splitting on 
the right-hand argument of the widening relation and by rule induction on the 
subinterface, subclass, and implementation relation. 

5.3 Type Soundness 

Finally, we state and prove the type soundness theorem. We motivate how we 
express type soundness, comment on the proof of the main theorem, and discuss 
it consequences. 

Goal. Type soundness is a relation between the type system and the semantics 
of a language meaning that all values produced during any program execution 
respect their static types. This can be formulated as a preservation property: 
For all state transformations caused by executing a statement or evaluating an 
expression, if in the original state the contents of all variables “conform” to their 
respective types, this holds also for any final state. Additionally, if an expression 
yields some result, this value “conforms” to the type of the expression. Of course, 
we can only expect all this to hold if we assume a well-formed program and well- 
typed statements and expressions. 
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It remains to specify what we mean exactly by ‘conforms’, which is inspired 
by Relative to a given program F and a state a, a value v conforms 

to a type T, written F,a\- v::T, iff the dynamic type of v widens to T. Via 
two auxiliary conformance concepts, this can be lifted to the notion of a whole 
state (7 conforming to an environment E. The proposition a::^E means that 
the value of any accessible variable within the state is compatible with its static 
type. Formally, these four concepts 

— h _ :: prog ^ st ^ val ^ ty ^ bool 

of a value conforming to a type, 

— :: prog st ^ {a,val)table {a,ty)table bool 
of all values in a table conforming to their respective types, 

— h _ O :: prog ^ st ^ obj => bool 

of all components of an object conforming to their respective types, and 

— _ :: state env bool 

of a state conforming to an environment 

are defined as follows: 

dsf 

r,a\- V \\< T = let dyn_ty = option_map obJ_ty o heap cr 

in 3T'. typeof dyn_ty v = Some T' A E \~ T' ^ T 

rlof 

E,a h Ts — Vn T. Ts n = Some T — > 

(3?;. vs n = Some v A E,a h v::A T) 

E,a\- Oh] Cfs::^0 = T,(t h/s[::^]table_of (fields T C 
E ,a\- Arr T csv. A <> = E ,a\- cs\ ::^]option_map (Ai. T) o cs 

{x,a)-.\A(E,A) = T,(rh locals A 

(Va obj. heap a a = Some obj — > E,a\- obj ::A O) A 

(Va. x= Some(XcptLoc a) — > TjCrhAddr a::^ Class(SXcpt Throwable)) 

The expression (option_map obJ_ty o heap a) a calculates the dynamic type of 
the object (if any) at address a on the heap. Note that the conformance relation is 
defined such that it does not take into account inaccessible variables, i.e. values 
that occur in the state but not in the corresponding component of the static 
environment. Among others, this frees us from explicitly deallocating exception 
parameters after a catch clause. 

With the help of the notions just introduced, we can express the propositions 
we aim to prove as follows. In the context of a well-formed program, the execution 
of a well-typed statement transforms a state conforming to the environment into 
another state that again conforms to the environment: 

E = {E,A) A wLprog E A E\- s::0 A a :: A E A E \- a —s^ a' — > a' wA E 

Analogously, the evaluation of a well-typed expression preserves the conformance 
of the state to the environment where, unless an exception has occurred, the value 
of the expression conforms to its static type: 

E = {E,A) A wLprog E A E\- e::T A a ::A E A E \- a —e\>v^ (a/,(r') — > 

(a/, a') ::^ E A {x! — None — > E,a' h v::A T) 

The validity of these two formulas will result as trivial corollaries from the 
main theorem, given next. 




Machine-Checking the Java Specification: Proving Type-Safety 149 



Main Theorem and Proof. To prove the intended type soundness theorems 
given above, we utilize rule induction on the derivation on the execution of 
statements and the evaluation of expressions. As these depend on each other, we 
must deal with statements and expressions simultaneously. In addition, in order 
to obtain a suitable induction hypothesis, we have to strengthen the propositions 
by adding the auxiliary “heap extension” predicate _ ^ _ (defined below) and 
introducing universal quantifications explicitly at several positions. As a result, 
the main theorem looks quite formidable, yet we attempt to cast it into words: 

wLprog r — > 

{r\- (x,a) —c {xf ,a') — > 

VA. > 

(T,A)h c::0 — > 

(xf ,a') (r,A) A <t < ct' 

A 

{r\- (x,a) —e\>v^ (a/,cr') — > 

VA. {x,a)::^{r,A) 

VT. (T,A)h e::T — > 

{xf ,a ') :: A (T, A) A a<a' /\ (xf = None — > F,a' h v::A T)) 

For a well-formed program F, if the execution of a statement transforms one 
state into another then for all local environments A, if the the statement is well- 
typed according to the environment (F,A) and the first state conforms to it, so 
does the second state, and the new heap is an extension of the old one. The same 
holds for expressions, but additionally the value of the expression conforms to 
its type, in case there is no exception. 

The “heap extension” is a pre-order on states of type st ^ st ^ bool, where 
means that any object existing on the heap of a also exists on a' and 
has the same type there. (If we considered garbage collection, we would have 
to restrict this proposition to accessible objects.) The heap extension property 
holds for any transition of the operational semantics, which turns out to be 
necessary in our inductive proof. 

Va obj. heap a a = Some obj — > 

3o&/. heap a' a = Some obf A obj_ty obf = obj_ty obj 

The proof of the main type soundness theorem is by far the heaviest. At the 
top level, it consists of currently 21 cases, one for each evaluation rule, where 

— 8 cases can be solved rather directly (e.g. from the induction hypothesis), 

— 7 cases require just simple lemmas on the structure of the state, and 

— the remaining 6 cases require extensive reasoning on the characteristic prop- 
erties of the constructs concerned. 

Most of this reasoning is independent of the operational semantics itself and can 
be tackled separately, which keeps the main proof manageable. 
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Consequences. A corollary of type soundness is that method calls always ex- 
ecute a suitable method, i.e. a ‘method not understood’ run-time error is im- 
possible. This property can be stated more formally: for a well-formed program 
and a state that conforms to the environment, if an expression of reference type 
(which plays the role of the target expression for the method call considered) 
evaluates without an exception to a non-null reference, and if for that (static) 
type and a given signature a method is available, the dynamic method lookup 
for the same signature according to the class instance pointed at by the reference 
value yields a proper method body: 

E = {r,A) A wLprog E f\ E\- e::RefT T /\ a \\<E f\ E \- a —et>a'^ Norm a' A 
a' ^ Val Null A dynT = fst (the_Obj (heap o' (the_Addr a'))) A 
mheads E T sig ^ {} — > 3m. cmethd E dynT sig = Some m 

This implies that in a well- formed context, in every instance of the evaluation 
rule for method calls, the function cmethd returns a proper method body. 

As it stands, the type soundness theorem does not directly say anything about 
non-terminating computations, which might lead to the conclusion that it is 
useless for the type-safety of reactive systems and looping programs. Fortunately, 
the theorem guarantees type-safety even in such cases if one accepts the following 
meta-level reasoning. An infinite computation can be interrupted after any finite 
number of computation steps, for example by introducing a counter of steps 
and raising an exception when a given value has been reached. The theorem 
implies that the state resulting from interrupting the computation after any finite 
number of statements executed conforms to the environment. Together with 
the fact that there is no single non-terminating statement, the whole (infinite) 
computation can be concluded to be type-safe. 

In addition to the evaluation semantics, we plan to define a transition seman- 
tics and prove both styles equivalent (for finite computations). The transition 
semantics will be less concise and abstract, but allows type soundness to be 
formulated as a subject reduction property, which is more natural for infinite 
computations. More importantly, it seems to be unavoidable to describe concur- 
rency (and I/O). 

6 Experience and Statistics 

Recalling our design goals stated at the end of Q we comment how far we have 
reached them and share some of the lessons learned during the project. 

Faithfulness to the official language specification. HOL’s expressiveness 
enables us to formalize the Java specification quite naturally and directly, 
without facing any severe obstacles. There is almost a one-to-one correspon- 
dence between the concepts given in the specification and those defined in 
Bali. As far as we could tell, all the messy well-formedness conditions in- 
herited from the language specification are actually needed somewhere in 
the proofs. This inspires confidence in the adequacy of both the specification 
and our formalization. 
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We do not yet have tools for automatically generating executable code from 
our theories, which would be an additional help in validating our formaliza- 
tion. The importance of such a mechanism became very obvious when we 
uncovered a mistake in our formalization (which was not present in 
but was introduced by modifications) when symbolically executing the ex- 
ample in this article in Isabelle: the list returned by function fields was in 
reverse order. Although the type soundness proof itself was an excellent de- 
bugging mechanism which caught many minor and some major mistakes, it 
failed to detect the wrong order because type soundness is independent of 
the order in which fields are inherited. In the original language specification 
we did not find any significant errors, but some omissions and unneeded 
restrictions, which we lifted. 

Succinctness and simplicity. Our policy to restrict the number of features 
considered and to make straightforward simplifications that do not diminish 
the expressiveness of the language has lead to a clear and straightforward 
formalization. Mixfix syntax and mathematical fonts as offered by Isabelle 
also contribute greatly to moderately readable definitions and theorems. 

The facility to conduct concise proofs strongly depends on the formal- 
ization. In our case, the use of the (also more elegant) evaluation semantics 
saved us from a lot of trouble, while the intricacies of a transition semantics 
faced by Drossopoulou and Eisenbach lead to several mistakes that 

were finally corrected during Syme’s machine-checked proof but 

at the expense of additional concepts. 

Maintainability and extendibility. Unless the language changes drastically, 
modifications tend to be of a local nature, but only if both the formalization 
and the proofs are reasonably structured. As always, modularity is the key 
issue. But when the formalization is extended, even well-structured proofs 
need to be modified, which remains a tedious job. Higher-level proof scripts 
and more automation are some of the answers. A dedicated mechanism for 
change management exploring and fixing the impact of modifications would 
also help. 

We are reasonably happy with the modularity of our work. For instance, 
Martin Biichi has adpoted the formalization (including the proofs), 

extended it to handle compound types, and proved the type-safety of the 
augmented language, all of which worked very smoothy. 

Adequacy for the theorem prover. Theorem provers are notoriously sensi- 
tive to the precise formulation of definitions and theorems. Thus the two 
goals of maximal automation of proofs and maximal abstractness of defini- 
tions are sometimes in conflict. In a number of cases this meant that although 
we could start with an abstract definition, we had to derive consequences 
which were better suited for the available proof procedures. Although we 
are far from satisfied with the current status of Isabelle’s proof procedures 
(for example, the handling of assumptions during simplification, or the ne- 
cessity to expand tuples and similar datatypes by hand), they are basically 
adequate for the task at hand. Nevertheless, more automation is necessary 
and feasible by extending the capabilities of Isabelle itself. 
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Statistics. We spent two months (estimated net time) developing and maintain- 
ing our formalization, and the Isabelle theory files produced add up to about 
1200 lines of well-documented definitions. To conduct and maintain the type 
soundness proof with all necessary lemmas, it took us roughly three months of 
work and about 2400 lines of proof scripts. 



7 Conclusion 

The reader has been exposed to large chunks of a formal language specification 
and a proof of type soundness and may need to be reminded of the benefits. Even 
including the slight generalizations mentioned at the beginning of ^ we did not 
discover a loop-hole in the type system. But we had not seriously expected this 
either. So what have we gained over and above a level of certainty far beyond 
any paper-and-pencil proof? 

We view our work primarily as an investment for the future. For a start, 
it can serve as the basis for many other mechanized proofs about Java, e.g. as 
a foundation for the work by Dean or for compiler correctness. More 

importantly, we see machine-checked proofs as an invaluable aid in maintaining 
large language designs (or formal documents of any kind). It is all very well to 
perform a detailed proof on paper once, but in the face of changes and extensions, 
the reliability of such proofs begins to crumble. In contrast, we developed the 
design incrementally, and Isabelle reminded us where proofs needed to be modi- 
fied. This has shown to be important, for example when we extended Bali with 
full exception handling. It will continue to help us further: apart from adding 
the last important Java features missing from Bali, e.g. threads, we also plan to 
use Bali as a vehicle for experimental extensions of Java such as parameterized 
types 

Despite our general enthusiasm for machine-checked language designs, a few 
words of warning are in order: 



— Bali is still a half-way house: not a toy language any more, but missing 
many details and some important features of Java. 

— The Java type system is, despite subclassing, simpler than that of your av- 

erage functional language: whereas the type checking rules of Java are al- 
most directly executable, the verification of MB’s type inference algorithm 
against the type system requires a significant effort The key compli- 

cation there is the presence of free and bound type variables, which requires 
complex reasoning about substitutions. Vaninwegen reports similar 

difficulties in her formalization of the type system and the semantics of ML. 

— Theorem provers, and Isabelle is no exception, require a certain learning 

effort due to the machine-oriented proof style. Recent moves towards a more 
human-oriented proof style like Syme’s DECLARE system promise 

to lower this hurdle. However, as Harrison points out, both proof 

styles have their merits, and we are currently investigating a combination. 
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In a nutshell: although machine-checked language designs for the masses are still 
some way off, this article demonstrates that they have definitely become a viable 
option for the expert. 
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Abstract A structural operational semantics of a significant sublan- 
guage of Java is presented, including the running and stopping of threads, 
thread interaction via shared memory, synchronization by monitoring 
and notification, and sequential control mechanisms such as exception 
handling and return statements. The operational semantics is paramet- 
ric in the notion of “event space” [6], which formalizes the rules that 
threads and memory must obey in their interaction. Different computa- 
tional models are obtained by modifying the well-formedness conditions 
on event spaces while leaving the operational rules untouched. In par- 
ticular, we implement the prescient stores described in [10, §17.8] which 
allow certain intermediate code optimizations, and prove that such stores 
do not affect the semantics of properly synchronized programs. 



1 Introduction 

The object-oriented programming language Java offers simple and tightly in- 
tegrated support for concurrent programming. In Java’s model of concurrency 
multiple threads of control run in parallel and exchange information by operating 
on objects which reside in a shared main memory. A precise informal descrip- 
tion of this model is given in the Java language specification [10]. Other notable 
references are [4] and [12]. 

This paper presents a formal semantics of a significant sublanguage of Java 
including the running and stopping of threads, thread interaction via shared 
memory, synchronization by monitoring and notification, and sequential control 
mechanisms such as exception handling and return statements. Here we focus 
on the dynamic semantics of Java and leave a detailed treatment of the static, 
type-related aspects of the language, e.g. class declarations, to a followup paper. 

Our semantics is given in the style of Plotkin’s structural operational seman- 
tics (SOS) [15]. In SOS, which has been used in the past for describing SML 
[13], evaluation is driven by the syntactic structure of programs. This allows a 
powerful proof technique for semantic analysis: structural induction. The idea 
inspiring the present work is that the semantics of real concurrent languages such 
as Java, with complex, interacting control features can be given in full detail by 
means of simple structural rules. 

* Research partially supported by the HCM project CHRX-CT94-0591 “De Stijl.” 

J.Alves-Foss (Ed.): Formal Syntax and Semantics of Java, LNCS 1523, pp. 157-200, 1999. 
c Springer-Verlag Berlin Heidelberg 1999 




158 



Pietro Cenciarelli, Alexander Knapp, Bernhard Reus, and Martin Wirsing 



One of the difficulties in modelling concurrent Java programs consists in 
capturing the complex interplay of memory and thread actions during execution. 
Each thread of control has, in Java, a private working memory in which it 
keeps its own working copy of variables that it must use or assign. As the 
thread executes a program, it operates on these working copies. The main 
memory contains the master copy of each variable. There are rules about when 
a thread is permitted or required to transfer the contents of its working copy 
of a variable into the master copy or vice versa. The process of copying is 
asynchronous. There are also rules which regulate the locking and unlocking 
of objects, by means of which threads synchronize with each other. All this is 
described precisely in [10, §17] in terms of eight kinds of low-level actions: Use, 
Assign, Load, Store, Read, Write, Lock, and Unlock. Here is an example of a 
rule from [10, §17.6, p. 407] involving locks and variables. Let T be a thread, V 
a variable and L a lock: 

“Between an Assign action by T on E and a subsequent Unlock action by 
T on L, a Store action by T on E must intervene; moreover, the Write action 
corresponding to that Store must precede the Unlock action, as seen by the 
main memory.” 

These rules impose constraints on any implementation of Java so as to allow 
a correct exchange of information among threads. On the other hand they 
intentionally leave much freedom to the implementor, thus permitting certain 
standard hardware and software techniques to improve the speed and efficiency 
of concurrent code. Therefore, it is only on the given rules that the programmer 
should rely to predict the possible behaviour of a concurrent program. Likewise, 
it is only the given rules that should constrain the possible execution traces 
generated by a correct operational semantics. 

The above considerations led us to base our semantics on the notion of event 
space. These correspond roughly to configurations in Winskel’s event structures 
[21] which are denotational, non- interleaving models of concurrent languages. 
The use of such structures in (interleaving) operational semantics is new. It al- 
lows us to give an abstract, “declarative” account of the Java thread model while 
retaining the virtues of a structural approach. This description is a straight for- 
mal paraphrase of the rules of [10]. Event spaces were introduced in [6], where we 
showed that their use in modelling multi-threading preserves the naive seman- 
tics of “sequential” computations (i.e. computations where one thread interacts 
synchronously with the memory). 

Basing our description of Java on the finely grained notion of event allowed 
us to observe phenomena which may be not readily seen when more abstract 
approaches are taken. For example, we realized that the asynchrony of commu- 
nication between main memory and working memories (viz. the loose coupling of 
Read and Load actions, and similarly of Store and Write) is actually observable 
in Java. Let threads 9i and 02, respectively running the code 

( 01 ) synchronized (p) {p.y = 2; }a = p.x; b = p.y; c = p.y; 

(02) synchronized (p) { p.y = 3; p.y = 100; } p.x = 1; 
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share a main memory in which p . x = p . y = 0, and let their working memories 
be initially empty. No parallel execution of 6 i and 62 in which main and working 
memories interact synchronously would possibly allow the values 1, 2 and 3 to 
be assigned respectively to a, b and c. Any model of execution not capable of 
producing a run with this assignment of values, indeed possible as we show in 
Section 2.3, provides maybe a correct implementation, but cannot be considered 
correct as semantics of Java. 

The operational semantics presented below is parametric in the notion of 
event space. This allows different computational models to be obtained by mod- 
ifying the well-formedness conditions on event spaces while leaving the opera- 
tional rules untouched. To show the flexibility of this approach we study the 
“prescient” store actions introduced in [10, §17.8]. Such actions allow optimiz- 
ing compilers to perform certain kinds of code rearrangements. A bisimulation 
is given to prove that such rearrangements preserve the semantics of properly 
synchronized programs (see also [17]). 

Related work. Several other semantics of sublanguages of Java are available in the 
literature. Much work has also been done on the semantics of the Java Virtual 
Machine [7, 16, 18]; this is one half of a formal semantics of the language, the 
other half being a description of a Java-to- Virtual Machine bytecode compiler, 
not available to date. 

In this volume Drossopoulou and Eisenbach [8] give a “small-step” structural 
operational semantics which covers roughly the sequential part of our sublan- 
guage of Java; their work, which is mainly concerned with proving type sound- 
ness, has been formalized by Syme [19]. Von Oheimb and Nipkow [14] also deal 
with a sequential sublanguage of Java and give a formal proof of type safety. A 
noteworthy difference between [8] and [14] is that the latter follows a “big-step” 
approach. In [9] Flatt, Krishnamurthy and Felleisen investigate the semantics of 
operators for combining Java classes (so-called “mixins”). All these semantics 
focus on type soundness for a sequential portion of Java. 

As for multi-threading, non-structural descriptions based on abstract state 
machines (see [11]) are given by Borger and Schulte [5], and by Wallace [20]. 

Synopsis. Section 2 describes and formalizes the Java memory-threads commu- 
nication protocol. Section 3 presents our event-based, structural operational 
semantics of Java. Section 4 studies the notion of prescient store action. Loose 
ends and future research are discussed in Section 5. 

2 Event Spaces 

In this section we describe and formalize the memory-threads communication 
protocol of Java. This is done by writing the rules of [10, §17] as simple logical 
clauses (Section 2.2) and by adopting them as well-formedness conditions on 
structures called event spaces (Section 2.4). The latter are used in the opera- 
tional judgements to constrain the applicability of some operational rules. An 
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example of event space is given in Section 2.3, describing the “1-2-3” parallel 
run of the threads 6 i and 62 introduced above. 

2.1 Actions and Events 

A formal notion of event is given below in terms of five sets of entities: 

— { Use, Assign, Load, Store, Read, Write, Lock, Unlock}, the action names; 

— ThreadJd, the thread identifiers; 

— Obj, the objects; 

— LVal, the left values (or “variables,” following [10]) and 

— RVal, the (right) values. 

Intuitively, Use and Assign actions do just what their names suggest, oper- 
ating on the private working memories. Read and Load are used for a loosely 
coupled copying of data from the main memory to a working memory and dually 
Store and Write are used for copying data from a working memory to the main 
memory. Lock and Unlock are for synchronizing the access to objects. 

Formally, an action is either a triple (A,9,o), where A G {Lock, Unlock}, 9 
is a thread (identifier) and o is an object, or a 4-tuple of the form (A,9,l,v), 
where A G {Use, Assign, Load, Store, Read, Write}, I is a variable, u is a value 
and 9 is as above. When A G {Use, Assign, Load, Store}, the tuple (A,9,l,v) 
records that the thread 9 performs an A action on I with value v, while, if 
A G {Read, Write}, it records that the main memory performs an A action on 
I with value v on behalf of 9. If A is Lock or Unlock, {A, 9, o) records that 
9 acquires, or respectively relinquishes, a lock on o. Actions with name Use, 
Assign, Load, Store, Lock and Unlock are called thread actions, while Read, 
Write, Lock and Unlock are memory actions. 

Events are instances of actions, which we think of as happening at different 
times during execution. We use the same tuple notation for actions and their 
instances: the context clarifies which one is meant. When no confusion arises we 
may omit components of an action or event which are not immediately relevant 
in the context of discourse: so {Read, 1) stands for {Read, 9, 1, v), for some 9 and 
V. Given a thread 9, we write a{9) for a generic instance of a thread action 
performed by 9. Similarly, (3{x) indicates a generic instance of a memory action 
involving a location or object x. 

2.2 The Rules of interaction 

Here we formalize the rules of [10, Chapter 17], to which we refer for a detailed 
discussion. These rules are translated into logical clauses describing the prop- 
erties of a poset of events called the “poset of discourse.” The events of such a 
poset, which are thought of as occurring in the given order, are meant to record 
the activity of memory and threads during the execution of a Java program. We 
assume that every chain of the poset of discourse can be counted monotonically: 
ao < ai < 02 <■■■ ■ The clauses in our formalization have the form: 

Va G ?7 . (^ ^ ((3bi G ?7 . !fi) V (dba G 77 . 1 F 2 ) V . . . (3b„ G 77 . lF„))) 
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where a and are lists of events, rj is the poset of discourse and Va G rj .<P 
means that holds for all tuples of events in rj matching the elements of a (and 
similarly for 3bi G rj The clauses are abbreviated by adopting the following 
conventions: quantification over a is left implicit when all events in a appear 
in quantification over b^ is left implicit when all events in b^ appear in 'I'i. 

Moreover, a rule of the form Va G rj . {true . . . ) is written a (■■•)■ When 

the symbols 9 and O' appear in a rule, we always assume that 0^0'. Similarly 
for values v and u', and for events a and a' . 

The rules are the following: The actions performed by any one thread are 
totally ordered, and so are the actions performed by the main memory for any 
one variable or lock [10, §17.2, §17.5]. 

a{0\a'{0) a(0) < a' {0) V a' {0) < a{0) (1) 

(3{x),(3'{x) (3{x) < (3'{x) V (3'{x) < (3{x) (2) 

Hence, the occurrences of any action (A, 6*, x) are totally ordered in the poset 
of discourse. We write rj{A, 9, x) the subposet of rj including only instances of 
{A,9,x). 

A Store action by 0 on ^ must intervene between an Assign by 0 of ^ and a 
subsequent Load by 6* of L Less formally, a thread is not permitted to lose its 
most recent assign [10, §17.3]: 

{Assign, 9, V) < {Load, 9, 1) ^ {Assign, 9, 1) < {Store, 9, 1) < {Load, 9, 1) (3) 

A thread is not permitted to write data from its working memory back to main 
memory for no reason [10, §17.3]: 

{Store, 9, 1) < {Store, 9, 1)' {Store, 9, 1) < {Assign, 9, 1) < {Store, 9, 1)' (4) 

Threads start with an empty working memory and new variables are created 
only in main memory and are not initially in any thread’s working memory [10, 
§17.3]: 



( Use, 9, 1) ^ {Assign, 9,1) < { Use, 9, 1) V {Load, 9,1) < { Use, 9, 1) (5) 

{Store, 9, 1) {Assign, 9, 1) < {Store, 9, 1) (6) 

A Use action transfers the contents of the thread’s working copy of a variable 
to the thread’s execution engine [10, §17.1]: 

{Assign, 9, l,v) < { Use, 9, 1, v') 

{Assign, 9, 1, v) < {Assign, 9, 1)' < ( Use, 9, 1, v') V (7) 

{Assign, 9, 1, v) < {Load, 9,1) <{ Use, 9, 1, v') 

{Load, 9, l,v) < { Use, 9, 1, v') 

{Load, 9, 1, v) < {Assign, 9,1) < { Use, 9, 1, v') V 
{Load, 9, 1, v) < {Load, 9, 1)' < { Use, 9, 1, v') 



(8) 




162 



Pietro Cenciarelli, Alexander Knapp, Bernhard Reus, and Martin Wirsing 



A Store action transmits the contents of the thread’s working copy of a variable 
to main memory [10, §17.1]: 

{Assign, 9, 1, v) < {Store, 9,1, v') ^ 

{Assign, 9, 1, v) < {Assign, 9, 1)' < {Store, 9, 1, v') 

The following rules require some events to be paired in the poset of discourse. 
Let A and B be posets, and let f \ A ^ B indicate that a function / is either 
a monotonic injection A ^ B with downward closed codomain or the partial 
inverse of a monotonic injection B ^ A with downward closed codomain. For 
every poset rj satisfying (1) and (2), for every thread 9, left value I and object 
o, there exist unique functions 

read_ofj^ Q i : rj{Load, 9, 1) ^ rj{Read, 9, 1) 
store_of^ g I ■ r]{ Write, 9, 1) ^ rj{Store, 9, 1) 
lock_ofj^ g^g : rj{Unlock,9,o) ^ rj{Lock , 9 , o) . 

These are called the “pairing” functions. Indices are omitted when understood. 
The function read_of matches the n-th occurrence of {Load, 9,1) in 77 with the 
n-th occurrence of {Read, 9, 1) if such an event exists in 77 and is undefined oth- 
erwise. Similarly for store_of and lock_of. 

Each Load or Write action is uniquely paired with a preceding Read or Store 
action respectively. Matching actions bear identical values [10, §17.2, §17.3]: 

{Load, 9, 1, v) {Read, 9, 1, v) = read _of {Load , 9, 1, v) < {Load, 9, 1, v) (10) 

( Write, 9, 1, v) {Store, 9, 1, v) = store_of{ Write, 9, l,v) < { Write, 9, 1, v) (11) 

Rules (10) and (11) ensure that read_of and store_of are total. We call load_of 
and write_of their partial inverses. 

The actions on the master copy of any given variable on behalf of a thread are 
performed by the main memory in exactly the order that the thread requested 
[10, §17.3]: 

{Store, 9, 1) < {Load, 9, 1) write _of {Store, 9, 1) < read _of {Load , 9, 1) (12) 

A thread is not permitted to unlock a lock it does not own [10, §17.5]: 

( Unlock, 9, o) lock_of{ Unlock, 9,o) < { Unlock, 9, o) (13) 

Rule (13) ensures that lock_of is total. We write unlock_of its partial inverse. 

Only one thread at a time is permitted to lay claim to a lock, and moreover 
a thread may acquire the same lock multiple times and does not relinquish 
ownership of it until a matching number of Unlock actions have been performed 
[10, §17.5]: 

{Lock, 9, o) < {Lock, 9' , o) unlock _of {Lock, 9, o) < {Lock, 9' , o) (14) 
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If a thread is to perform an Unlock action on any lock, it must first copy all 
assigned values in its working memory back out to main memory [10, §17.6] (this 
rule formalizes the quotation in the introduction): 

{Assign, 6, 1) < {Unlock, 6) ^ 

{Assign, 6, 1) < store_of{ Write, 9,1) < { Write, 9,1) < { Unlock, 9) 

A Lock action acts as if it flushes all variables from the thread’s working memory; 
before use they must be assigned or loaded from main memory [10, §17.6]: 

{Lock, 9) < {Use, 9,1) 

{Lock, 9) < {Assign, 9,1) < { Use, 9, 1) V (16) 

{Lock, 9) < read_of {Load, 9,1) < {Load, 9,1) < {Use, 9,1) 

{Lock, 9) < {Store, 9,1) =i> {Lock, 9) < {Assign, 9,1) < {Store, 9,1) (17) 

Discussion. Each of the above rules corresponds to one rule in [10]. Note that 
the language specification requires any Read action to be completed by a corre- 
sponding Load and similarly for Store and Write. The above theory does not 
include clauses expressing such requirements because it must capture “incom- 
plete” program executions (see Section 4). Except for read and store completion, 
any rule in [10] which we have not included above can be derived in our axiom- 
atization. In particular, 

{Load, 9, 1) < {Store, 9, 1) ^ {Load, 9, 1) < {Assign, 9, 1) < {Store, 9, 1) (*) 

of [10, §17.3] holds in any model of the axioms. In fact, by (6) there must 
be some Assign action before the Store; moreover, one of such Assign must 
intervene in between the Load and the Store, because otherwise, from (1) and 
(3), there would be a chain {Store, 9,1) < {Load, 9,1) < {Store, 9,1) with no 
Assign in between, which contradicts (4). Similarly, the following rule of [10, 
§17.3] derives from (10) and (11): 

{Load, 9,1) < {Store, 9,1) read _of {Load, 9,1) < write_of {Store, 9,1) 

Clauses (6) and (17) simplify the corresponding rules of [10, §17.3, §17.6] which 
include a condition {Load, 9,1) < {Store, 9,1) to the right of the implication. 
This would be redundant because of (*). 

2.3 Example 

We briefly illustrate the above formal rules on the example given in the intro- 
duction, where two threads 

{9i) synchronized (p) { p.y =2; } a = p.x; b = p.y; c = p.y; 

(^ 2 ) synchronized (p) { p.y = 3; p.y = 100; } p.x = 1; 

start with a main memory where both instance variables p . x and p . y have value 
0, and with empty working memories, and interact so that the values 1, 2 and 3 
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are eventually assigned to a, b, and c respectively. We shall run part of this 
example through our operational rules in Section 3.7. Figure 1 describes this 
run as a poset of events, whose ordering is represented by the arrows. The actions 
of the two threads and of the main memory on the two instance variables p . x 
and p . y are aligned vertically in four columns. We let o be the object denoted 
by p, while x and y stand for the left values of p . x and p . y respectively. 

Since all actions performed by the same thread and by the memory on the 
same variable must be totally ordered, each column of Figure 1 is a chain. More- 
over, some memory actions must occur before or after some thread actions. For 
example, a ( Write, 6*i, y, 2) must come after {Assign, 6*i, y, 2) because, as dictated 
by the structure of the program, an Unlock follows the assignment p . y = 2, and 
hence, by (15), 0i’s working copy of y must be written in main memory before 
the Unlock and after a corresponding Store. Note that not all the assigned val- 
ues must be stored in main memory. For example, it would have been legal to 
omit {Store, 6*2, y, 3) and ( Write, 62 , y, 3); in this case, however, the value 3 would 
have never been passed to 9\ . Similarly, not all the values used by a thread must 
be first loaded from main memory: in the example no {Load,9i,y,2) precedes 
{Use,9i,y, 2). 

As stated in the introduction, the above assignments to a, b and c would 
not be possible if communication between main and working memories where 
“synchronous,” that is if no other event were allowed to happen between a Read 
and a corresponding Load or, equivalently, if these two actions were executed as 
a single atomic step (and similarly for Store and Write). Assume in fact that 
there is a synchronous run producing a = 1, b = 2, and c = 3. Since 3 must 
be assigned to c, an action {Read, 9\, y, 3) must occur, and moreover it must be 
after 6*2 writes 3 and before it writes 100 in the master copy of y. Hence, by (15), 
{Read,9i,y,3) must occur while 6*2 is executing the synchronized block. Again 
by (15), a {Store,9i,y,2) must occur before 9\ exits its synchronized block; 
moreover this Store must occur before {Read,9i,y,3), otherwise the value 3 
would be lost, and therefore 6*1 must enter its synchronized block before 6*2. 
Then, in order to get the value 1 for a, the assignment a = p . x must occur 
after 02 has left the block, it has assigned, stored and written 1 in x, and after 
01 has read and loaded such value in its working copy of x. However, by the 
time 01 can load 1 in x, the value of y in its working memory must already be 3, 
because a {Read, 9\, y, 3) occured while 02 was executing the synchronized block. 
Therefore, to assign 2 to b, 0i can neither rely on the content of it’s working 
copy of y, nor on the master copy in main memory, which, by now, must contain 
100. 

2.4 Event Spaces 

An event space is a poset of events every chain of which can be counted monoton- 
ically (oo < oi < 02 < . . . ) and satisfying conditions (1) to (17) of Section 2.2. 

Event spaces serve two purposes in our operational semantics: On the one 
hand they provide all the information needed to reconstruct the working mem- 
ories (which in fact do not appear in the operational judgements). On the other 
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{Lock, 6 * 1 , o) 

I 

{Assign, 6i,y, 2) 

I 

{Store, 9i,y, 2) 



( Unlock, 6 * 1 , o) 



{Write, 6i,y, 2) 



{Write, 02, y,i) 

\ 

{Read, 6 *i, y, 3) 

I 

{Write, 92, y,im) 



{Lock, 02, o) 

\ 

{Assign, 02, y, 3) 

I 

{Store, 02, y, 3) 

I 

{Assign, 6 * 2 , y, 100) 

I 

{Store, 6 * 2 , y, 100) 



( Unlock, 6 * 2 , o) 

I 

{Assign, 6 * 2 , a;, 1) 

I 

{Store, 6 * 2 , a;, 1) 



( Write, 6 * 2 , a;, 1 ) 

I 

{Read, 6 * 1 , a;, 1) 



{Load, 6 * 1 , a;, 1) 

I 

( Use, 6 * 1 , a;, 1 ) 

I 

{Use, 01, y, 2) 

\ 

{Load, 01, y, 3) 

\ 

{Use, 01, y, 3) 



Figure 1. An event space for Example 2.3 
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hand event spaces record the “historical” information on the computation which 
constrains the execution of certain actions according to the language specifica- 
tion, and hence the applicability of certain operational rules (see Section 3 . 4 ). 

Given two event spaces {X, <x) and {Y, <y), we say that {X, <x) is a con- 
servative extension of (Y, <y) when Y C X and <y C <x and, for all a, 6 € Y , 
a b implies a <y b. 

To adjoin a new event a to an event space rj = (AT, <x), we use an operation 0 
defined as follows: rj(Ba denotes nondeterministically an event space rj' = (Y, <y) 
such that: 

— 77' is a conservative extension of rj, with Y = Y U {a}; 

— if a = a{6) is a thread action performed by 9, then a' <y a for all thread 
actions a' = a'{6) by 9 in 77'; 

— if a = (3{x) is a memory action on x, then a' < a for all memory actions 
a' = (3'{x) on a: in 77'. 

If no event space 77' exists satisfying these conditions, then 7700 is undefined. For 
example, by ( 5 ), the term 770 ( Use, 9, 1) is defined only if a suitable {Assign, 9, 1) 
or {Load, 9, 1) occurs in 77. If 77 is an event space and a = (oi, 02, ... , a„) is a 
sequence of events, we write 77 0 a for 77 0 oi 0 02 0 • • • 0 a„. 

As little ordering may be added to an event space by the operation 0 as 
is required by the rules of interaction: indeed two expressions 77 0 a 0 5 and 
77 0 6 0 a may denote the same event space. This reflects the fact that the same 
concurrent activity may be described by different sequences of interleaved events. 
More ordering can also be introduced than strictly dictated by the rules. For 
example, the expression {Read, 9, o) 0 {Lock, 9, 1, v) 0 {Load, 9, 1, v) may produce 
an event space {{Lock, 9, o) < {Read, 9, 1, v) < {Load,9,l,v){\ although no rule 
enforces that {Lock, 9, o) < {Read, 9, 1, v), it better be so in view of rule ( 16 ) if a 
( Use, 9, 1) is to be further added to the space. 

3 Operational Semantics 

The present paper focuses on the dynamic semantics of Java. Of course, the 
behaviour of a program may depend on type information obtained from static 
analysis. Part of this information we assume is retrievable at run-time from the 
main memory (see Section 3 . 1 ), part goes to enrich the syntactic terms upon 
which the operational semantics operates (see Section 3 . 2 ). 

In Java every variable and every expression has a type which is known at 
compile-time. The type limits the possible values that the variable can hold or 
expression can produce at run-time. Adopting the terminology of [ 10 ], every 
object belongs to a class (the class of the object, the one which is mentioned 
when the object is created). Moreover, the values contained by a variable or 
produced by an expression should, by the design of the language, be compatible 
with the type of the variable or expression. A value of primitive type (such as 
booleans) is only compatible with that type (boolean), while a reference to an 
object is compatible with any class type which is a superclass of the object’s 
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class [10, §4.5.5]. We do not implement run-time compatibility checks in our 
semantics (they can be added straightforwardly). For example, like in Java, we 
do not check that the object produced by evaluating the expression e in throw e; 
is compatible with Throwable. However, we do use type information wherever 
it is needed to drive computation. An example is the execution of a try-catch 
statement (see Section 3.8). 

Java’s modifiers are not treated in the present paper. For example, we do 
not consider static fields; these would require minor changes of the semantic 
machinery. Similarly, synchronized methods can be easily implemented by using 
synchronized statements (see Section 3.7), as remarked in [10, §8. 4. 3. 5]. 

After introducing in Section 3.1 semantic domains such as stores and envi- 
ronments, we describe a “compilation” function translating Java programs into 
semantically enriched abstract syntax (Section 3.2). Next, we define operational 
judgements (Section 3.3) and give the SOS rules which generate them. These 
are presented in homogeneous groups (expressions, statements, exceptions, etc.) 
in Section 3.4 to 3.10. 

3.1 Semantic Domains 

Primitive semantic domains. These are the building blocks of our operational 
semantics, and nothing is assumed on the structure of their elements. 

We call RVal the primitive domain of (right) values. These are produced by 
the evaluation of expressions and can be assigned to variables. A distinguished 
subset Ohj of RVal is also given as primitive; we call its elements (references to) 
objects. In particular, since threads are objects in Java, we choose the domain 
ThreadJd of the previous section to be Obj. Right values come equipped with a 
primitive function value mapping literals to the corresponding values. 

value : Literal — > RVal 

In particular, null is the reference to the null object denoted by the literal null, 
that is: null = t'oZMe(null). Similarly, true = uaZrte (true) and so on. 

In Java the object denoted by an expression e may contain several fields 
with the same name z; then, the type of e decides on which field is actually 
accessed by the expression e.i. An identifier together with a type are therefore a 
non-ambiguous name for field access. We call Fieldidentifier, ranged over by /, 
the set of such pairs (see Table 1). The domain of left- values introduced in the 
previous section is not primitive: an instance variable is addressed by a non-null 
object reference o together with a field identifier /, and written o.f. 

LVal= {Obj\ {null}) x Fieldidentifier 

Store is the primitive domain of stores ranged over by /i. This domain comes 
equipped with the following primitive semantic functions, where Class Type is as 
in Appendix A: 



new : ClassType x Store Obj x Store 
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upd : LVal x RVal x Store Store 
rval : LVal x Store RVal. 

Besides providing storage for variables, stores are assumed to contain infor- 
mation produced by the static analysis of a program; typically: the names and 
types of fields and methods for each class, the initial values of fields, the subclass 
relation, and so on. This information does not change during execution and it 
could alternatively be kept separate from stores. 

Given a class type C and a store /i, the function new produces a new object 
of type C with suitably initialized instance variables, and returns it in output 
together with /i updated with the new object. We write: 

o C, 

dropping p, when understood, to mean that o is a reference to an object in /i of a 
class type which is compatible with C. We also assume that the partial function 
init : Fieldidentifier x Store RVal returns the initial values for an object’s 
fields. The domain of this function is the set of pairs (/, p) where / = (z, C) and 
z is an appropriate field for C in p. 

The function upd updates a store, while rval gets the right-value associated 
in a store with a given left- value. These functions are partial: they are undefined 
on the left-values o.f where / is not an appropriate field for o in the given store. 
We write p[l i-^- u] and p{l) for upd{l,v, p) and rval{l,p) respectively. 

A rather weak axiomatization of stores is given below by using a binary 
predicate ^ (written infix). The meaning of e\ ^ 62 is that if e\ is defined, then 
so is 62 and they denote the same value. By ei ~ 62 we mean that both ei -< 62 
and 62 ^ 6i hold. 



p{l) A p'{l) 

init{{i,C), p) :< p'{o.{i,C)) 
p[l 1-^- v]{l) :< V 
p[l' 1-^ z ;](0 ~ p{l) 
p[l 1-^- v'][l 1-^- f] ^ p[l 1-^- u] 
p[l' 1-^- v']\l 1-^- z;] ~ p[l 1-^- v][l' I 
p[l I— > p{l)] ^ p 



where new{C, p) = (o, p') 
where new{C, p) = (o, p') 

iU^l' 

v'] iil^l' 



Finally, Throws is the primitive domain of exceptional results. Upon occur- 
rence of an exception, Java allows objects to be passed to handlers as “reasons” 
for the exception. The primitive function 

throw : Obj — > Throws 

turns an object into an exception throw(p) “with reason o.” Note that elements 
of Throws are not right values. 
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Environments and stacks. Environments are pairs (/, p) where / is a subset of 
Identifier U {this} and p is a partial function from I to right values. 



J = Identifier U {this} 
Env = ^ RVat) 



The component I of an environment (I,p), called the source of p, is meant to 
contain the local variables of a block and the formal parameters of a method body 
or of an exception handler. Environments are also used to store the information 
on which object’s code is currently being executed: p{this). By abuse of notation, 
we write p for an environment (/, p) and indicate with src{p) its source I. In 
particular, we understand that p$ is an empty environment (/, pg) such that p${i) 
is undefined for all i G I. As usual, p[i u] (j) = vifi = j and p[i u] (j) ~ p{j) 
otherwise. 

Let Stack be the domain of stacks of environments, and let the metavariable 
a range over this domain. The empty stack is written erg. The operation push : 
Envy. Stack Stack is the usual one on stacks. An instance variable declaration 
i = v binds u to z in the topmost environment of a stack a; we write a[i = u] the 
result of this operation. The result of assigning u to z in the first environment 
(/, p) of a such that z S / is written <t[z u]. The value associated with z in 
such an environment is denoted by cr(z). More precisely: 



a[i = u] 



cr[i 1 -^ z;] 



cr(i) 



push(p[i 1 -^ z;], a') 
undefined 



if (T = push{p, a') and z G src(p) 
otherwise; 



{ push(p[i 1 -^ z;], a') 
push{p, a'[i z;]) 
undefined 



if (T = push{p, a') and z G src{p) 
if (T = push{p, a') and z ^ src{p) 
otherwise; 



{ p(z) if (T = push{p, a') and z G src{p) 

a'{i) if (T = push{p, a') and z ^ src{p) 

undefined otherwise. 



3.2 Abstract Terms 

The operational semantics presented below does not work directly on the Java 
syntax of Appendix A, which we call concrete, but on the abstract terms pro- 
duced by the grammar of Table 1. We call A- Term the set of abstract terms and 
let t range over this set. Concrete and abstract syntax share the clauses defining 
Identifier, Literal, ReturnType and ClassInstanceCreationExpression. 

Some of the abstract terms, those which cannot be further evaluated, play 
the role of results in our operational semantics. There are operational rules 
which only apply when a result is produced ( [assignd] for example) . Some of the 
results are called abrupt (see Section 3.8), as specified by the following grammar: 

Results ::= * | RVal \ AbruptResults 
AbruptResults ::= Throws \ return RVal \ return 
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The terms return v and return are results produced by evaluating return state- 
ments, respectively with and without a return value. 

In most cases, abstract terms look just like their concrete counterparts. Some 
abstract terms, however, are enriched with semantic information produced by 
the static analysis of the Java program. For example, abstract blocks, which we 
write {S}p, have two components: a sequence S of (abstract) statements and an 
environment p containing the local variables of the block. We leave p implicit 
when irrelevant. 

Unlike with field identifiers, the method invoked by a method call e.z(. . .) 
is only known at run-time, because it depends not only on the static type C of 
e but on the dynamic class type of the object denoted by e. At compile-time, 
however, a “most specific compile-time declaration” is chosen for i among the 
methods of C and of its superclasses. The class where this declaration is found, 
the types of the parameters and the return type are attached by the compiler to 
i for later run-time usage (see [10, §15.11] for more detail). This motivates the 
introduction of the domain Methodidentifier in the abstract syntax. When the 
rest is understood, we write just the identifier of a method identifier. 

A recursive function (_)° translates concrete into abstract syntax. Terms 
of the shared domains are translated into themselves. The concrete list-like 
syntactic domains, such as BlockStatements, are translated in the obvious way 
into abstract domains of the form 1C* and IC'^, where: 

1C* ::= 0 I /C/C* 

/C+ ::=/C I /C/C* . 

Lists that are optional in a concrete term are translated into the empty list () 
when missing. In writing abstract terms we often omit the empty list. 

The translation is generally trivial. For example: (throw (e))° = throw (e°). 
All non-trivial cases are listed in Table 2. We understand that a “declaration 
environment” is implicitly carried along during translation, recording the static 
information collected from processing class declarations. We express that an 
expression e has declared type r (in the current declaration environment) by 
writing e : r. 

Every syntactic domain A-IC of the abstract syntax corresponds to a concrete 
domain /C, and the translation is such that t G 1C whenever t° G A-IC. There 
are syntactic categories in the abstract syntax which have no counterpart in the 
concrete; these are: Obj, RVal, Throws, Fieldidentifier, Methodidentifier and 
ActivationFrame. Of these only the latter is still to be discussed, which we do 
in Section 3.5. 



3.3 Operational Judgements 

Configurations. A configuration represents the state of execution of a multi- 
threaded Java program; therefore, it may include several abstract terms, one for 
each thread of execution. Each thread has an associated stack. We call M-term 




An Event-Based Structural Operational Semantics of Multi-threaded Java 



171 



A- Statement | ; | A- Block \ A-StatementExpression; 

I synchronized ( A-i?a;presszon) A-Block 
I A-IfThenStatement \ AbruptResults 
I throvi A- Expression; \ A -Try Statement 
I return; | return A-Expression; 

A- Block ::= { A-BlockStatement* } Env 
A-BlockStatement ::= A-LocalVariahleDeclaration; \ A- Statement 
A-LocalVariahleDeclaration ::= Type A-VariableDeclarator'^ 

A-VariableDeclarator ::= Identifier = A-Expression 

A-Expression ::= RVal \ Throws \ Literal \ Identifier \ this 

I A-FieldAccess \ ClassInstanceCreationExpression 
I A-Methodinvocation \ ActivationFrame 
I A- Assignment \ UnaryOperator A-Expression 
I A-Expression BinaryOperator A-Expression 
A-FieldAccess ::= A-Expression . Fieldidentifier 
Fieldidentifier ::= {Identifier , ClassType) 

A-MethodInvocation ::= A-Expression . Methodidentifier { A-Expression* ) 
Methodidentifier ::= {Identifier , ClassType, Type*, ResultType) 
ActivationFrame ::= {Methodidentifier , A-Block) 

A-Assignment ::= A-LeftHandSide = A-Expression 
A-LeftHandSide ::= Identifier \ A-FieldAccess 
A-StatementExpression ::= A-Assignment \ ClassInstanceCreationExpression 

I A-MethodInvocation \ ActivationFrame 
A -Try Statement ::= try A-Block A-CatchClause'^ 

I try A-Block A-CatchClause* finally A-Block 
A-CatchClause ::= catch ( Type Identifier) A-Block 



A-IfThenStatement ::= if {A-Expression) A-Statement 



Table 1. Abstract syntax 



i S y° = { S° }(/,pu) where / is the set of local variables 
declared in S 

(catch (r i) b)° = catch (r i) { S' }(/u{i},pu) where { S }(/,p„) = b° 
((e))° = e° 

(e.z)° = e°.f where e : r and / = (z, r) 

{e.i(E))° = e°.m{E°) where m = {i,C,T,r) and the 

“compile-time declaration” of z is found 
in C and has signature T t 

{ i if z appears in the scope of a local 

variable declaration with that name; 
this./ otherwise, where this : r and / = (z, r). 



Table 2. Translation to abstract syntax 
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a partial map from thread identifiers to pairs (t, a), where t is an abstract term 
and (T is a stack. We let the metavariable T range over M-terms: 

T : Thread Jd A- Term x Stack. 

When we assume that 9 is not in the domain of T we write T \ (6*, t, ct) for the 
M-term T' such that T'{ 9 ) = (t, a) and T'{ 9 ') ~ T{ 9 ') for 9 ' ^ 9 , where ~ is as 
in Section 3 . 1 . 

A configuration of the operational semantics is a triple (T, 77, /i) consisting 
of an M-term T, an event space 77 and a store fi. In writing configurations, we 
generally drop the parentheses and all parts that are not immediately relevant in 
the context of discourse; for example, we may write just “t, a, if to mean some 
configuration (T | { 9 , t, a), 77, /i). Configurations are ranged over by 7. 

Operational semantics. The operational semantics is the smallest binary relation 
— ► on configurations which is closed under the rules of Section 3.4 to 3 . 10 . 
These are, in fact, rule schemes, whose instances are obtained by replacing 
the metavariables with suitable semantic objects. Rules with no premise are 
called axioms. Related pairs of configurations are written 71 — ► 72 and called 
operational judgements or transitions. 

Rule conventions. In writing an axiom 71 — ► 72 we focus only on the relevant 
parts of the configurations involved, and understand that whatever is omitted 
from 7i remains unchanged in 72. For example, we understand that the axiom 
; — ► * stands for T | ( 0 , ; , <t), 77, 77 — ► T | ( 0 , *, <t), 77, 77. On the other hand, rules 
with a premise are read by assuming that whatever changes occur in the omitted 
parts of the premise (besides thread identifiers) also occur in the conclusion 
(unless otherwise specified). For example, we understand that: 

ei— -62 Ti I ( 0 ,ei,(Ti), 771, 771 T2 I ( 0 , 62 ,( 72 ), 772, 772 

stands for — — . 

6i ; — - 62 ; Ti I ( 0 , 61 ; , cti), 771, 771 — - T2 | ( 0 , 62 ; , (72), 772, 772 

Metavariable convention. The metavariables used below (in variously deco- 
rated form) in the rule schemes range as follows: k G Literal, i G Identifier, 
f G Fieldidentifier, m G Methodidentifier, o G Obj, I G LVal, v G RVal, 
V G RVat , 6 G A-Expression, E G A-Expression* , r G Type, C G ClassType, 
d G A-VariableDeclarator, D G A-VariableDeclarator* , s G A-BlockStatement, 
S G A-BlockStatement* , b G A-Block, h G A-CatchClause, H G A-CatchClause* , 
c G Results, and q G AbruptResults. 

3.4 “Silent” Actions 

We call Load, Store, Read and Write the “silent” actions because they may 
spontaneously occur during the execution of a Java program without the inter- 
vention of any thread’s execution engine (no term evaluation). In some cases 
such an occurrence is subject to the previous occurrence of other actions. In 
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the operational semantics, the relevant “historical” information is recorded in a 
configuration’s event space. Note that, given an event space rj and an action a, 
only if ?7 0 a is defined, and hence the occurrence of a in 77 complies with the 
requirements of the language specification, can a rule rj — »- 77 0 a be fired. This 
point is crucial for a correct understanding of the rules [read, load, store, write] 
for silent actions given in Table 3, as well as [assignS, accessS] of Table 4 and 
[syn2, syn4] of Table 8. 

The same argument explains how is the [store] rule able to “guess” the right 
value to be stored: the axioms (6) and (9) of Section 2 guarantee that the 
apparently arbitrary value u in 77 — ► 77 0 {Store, 9, 1, v) is in fact the latest value 
assigned by 9 to 1. In Section 4, changing the event space axioms, we let [store] 
make a real guess on v, by looking “presciently” into the future. 



[read] ^ 


T, rj, 77 — ► r, 77 0 {Read, 9, 1, fi{l)),fi 


[load] ^ 


T, 77 — >- T, 77 0 {Load, 9, 1, v) 


[store] ^ 


T, Tj — >- T, 77 0 {Store, 9, 1, v) 


[write] ^ 


T, rj, — >- T, 77 0 ( Write, 9, 1, v),^j.[l ^ v 



^ if T{9) is defined 

Table 3. “Silent” actions 



3.5 Expressions 

Table 4 contains the rules for expressions. 

To evaluate the assignment to an instance variable successfully, the left hand 
side is evaluated first by repeatedly applying [assignl], until a left value is pro- 
duced. Then the right hand side is evaluated by [assignS], and the assignment of 
the resulting value is recorded in the event space by [assignS]. Note that [assignl] 
does not apply to an assignment ci = e when e\ is a left value I because, even 
though I may further evaluate to a right value v by [accessS ] , v = e would not be 
a legal abstract term. The same argument applies below to rules such as [syn3] 
and so forth. Note that evaluating null.f to throw{o) in rule [access2] would 
not allow exceptions thrown to the left hand side of an assignment to propagate 
outward in the structure of the program (see Section 3.8). To wit, throw{o) is an 
expression result while throw{o).f can be viewed as an “A-LeftHandSide result.” 

The rules [assign2] and [assign4] deal with assignments to local variables. In 
the present semantics an attempt to access a field of the null object raises a 
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NullPointerException [access2]. A more elaborate treatment is required when 
static fields are considered (see [10, §15.10.1]). 

The evaluation of a method invocation e.m(ei, . . . , Ck) is done in three steps: 
First e, Ci, . . . , Cfc are evaluated (in this order). If evaluation is successful, the 
actual method to be invoked is then determined from m and from the type of 
the object denoted by e. We deal with non-successful evaluations in Section 3.8. 
Finally, the actual method call is performed. We assume that the run-time 
retrieval of methods is performed by a function 

methodBody : ClassType x Methodidentifier x Store A-Block x Identifier* 

which receives in input the class of the object for which the method is being 
invoked, a method identifier m and a store (containing the class declarations), 
and returns, together with the body of m, the list of its formal parameters. This 
function is partial: methodBody {C, m, /i), where m = (i, C , T, r), is undefined if 
no user-defined method i with signature T ^ t can be found in yi, inspecting the 
classes which lie between C and C in the class hierarchy. In that case m could 
still be a Java huilt-in method, like start or stop, otherwise a compile time 
error would have occured. Separate operational rules are provided for built-in 
methods (see Table 12 for example). Note that all such rules are subject to the 
condition that methodBody is undefined (which it must be for final methods), 
thus implementing method overriding. 

Method calls produce activation frames, the elements of ActivationPrame 
in Table 1. The block of a frame represents the body of the invoked method. 
Activation frames are produced at run-time by the function 

frame : Obj x Methodidentifier x RVal* x Store ActivationPrame 

defined as follows: frame{o,m,V, p) = {m,{S} p[this^6\[i^v])i for an object o 
of type C, if methodBody{C ,m, fi) = {{S}p,I); otherwise it is undefined. Note 
that, since the type of the null object has no name (see [10, §4.1]), frame is always 
undefined when applied to null. Since it is the “static” information contained 
in /i which is used by frame, we generally leave this parameter implicit. The 
operational rules for evaluating activation frames are given in Table 5. 

Start configuration. Let C be the only class in a program called P to be public, 
and let the compilation of P produce an initial store recording all relevant 
type information. Let C have a method main with a string parameter (this is 
a simplifying assumption: Java requires an array of strings, but arrays are not 
treated in this paper) . We understand that a command line “ j ava P arg” given 
as input to the computer produces a start configuration 

(6», (main, { S }p[iH^„]), erg), 0, p. 

where 0 is the empty event space, {9, p) = new (Thread, /ig), v = value(arg), and 
methodBody {C, main, /ig) = ({ -S' }p, i). 
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[assignl] 

[assigns] 



ei — » 62 

6i = e — ► 62 = e 

6i » 62 

I = 6i >- I = 62 



[assign2] 



6i » 62 

Z = 6i ► i = 62 



[assignd] i = v ,a — >- v , (t[z v] 



[assigns] 



{ 9,1 = v),rj — «- { 9 , v),rj(B {Assign, 9 , 1 , v) 



[accessl] 



6i » 62 

ei ■ / — - 62 . / 



[access2] ^ null . f, ^ — »- throw{o) . f, fi' 



[accessS] 



{ 9 , l),rj — ► { 9 , v),rj 0 {Use, 9 , 1 , v) 



[this] 


this , a — »- a{this) , a 


[new] 


new C ( ) , /i — ► new {C, 


[unopl] 


6i » 62 

op 61 — «- op 62 


[binopl] 


61 — » 62 

61 bop 6 ► 62 bop 6 


[binopS] 


Vi bop V2 ► bop{vi,V2) 



[var] 


z , a — >- a{i) , a 


[lit] 


k — ► value{k) 


[unop2] 


op V — ► op{v) 


[binop2] 


61 — - 62 

V bop 61 — ► V bop 62 



[parseql] 


61 — » 62 

61 E — ► 62 E 


[parseq2] 


[calll] 


61 — » 62 

ei.m{E) — ► e2-m{E) 


[call2] 



El — ► E2 

V El — ► V E2 

El — ► E2 

o.m{Ei) — >- o.m{E2) 



[calls] 



o.m{V) — »- frame{o, m, V) [call4]^ 



null.m{V), /i — >- throw {o), /i' 



^ where {o,g!) = newi(NullPointerException, /i) 

Table 4. Expressions 



[frame] 

[exit2] 



bi — »- 62 

{m, bi) — ► {m, 62) 
{m, { return S }) ; — >- * 



[exitl] (m, {});—►* 

[exits] (m, { return f 5 }) — ► v 



Table 5. Activation frames 
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[decl] 



ei — » 62 



r z = 6 i D] — >- T i = 62 D; 
[locvardecll] t i = v d D] , a — ► t dD] , a[i = f] 

[locvardecl 2 ] t i = v, , a — ► * , a[i = v] 

Table 6. Local variable declarations 



[expstatl] 


61 — » 62 


[expstat 2 ] 


7 ? * ^ 


Cl ; — ► 62 ; 


61 — » 62 


[skip] 


[ifl] 


; — ► * 


if ( 61 ) s — >- if ( 62 ) s 


[if 2 ] 


±f{true) s — ► s 


[if3] 


±f {false) s — ► * 



Table 7. Expression statements, skip and conditional 



[statseq] 



Si ► S2 

S^S ^S2S 



*S 



[block 1 ] 






[block 2 ] 

[synl]i 

[syn 2 ] 

[syn3] 

[syn4] 



Si,push{pi,ai) — >- S2,push{p2,a2) 

{'S'ljpi, CTi >- {52}p2, (J 2 

6 i » 62 

synchronized ( 6 i) b — «- synchronized ( 62 ) b 

6, T]1 — ► O, T]2 

(0, synchronized (c) b), rji — «- synchronized (o) 6 , rj 2 © {Lock, 6, o) 

bi — ► 62 

synchronized (o) bi — ► synchronized (o) 62 

6 , T]1 6, T]2 

{9, synchronized (o) b), rji — ► c, rj 2 © {Unlock, 9, o) 



Mf 62 ^ RVal 



Table 8 . Blocks and synchronization 
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3.6 Local Variable Declarations 

The rules for local variable declarations are given in Table 6. 

3.7 Statements 

Table 7 contains the rules for expression statements, skip and conditional state- 
ments. Table 8 contains the rules for blocks and synchronization. The statements 
for control manipulation (return and exception try) are treated in Section 3.8. 

Example. Consider the two threads 6 i and 62 of Example 2.3 running in parallel 
with initially empty working memories, empty event space 0, and stacks mapping 
the local variable p to o. We write t 2 the portion of program run by 62 . In the 
example 6*1 enters its synchronized block first. Its evaluation is described in 
Figure 2, where stacks are omitted. 



3.8 Control mechanisms 

In Java, the evaluation of expressions and statements may have a normal or 
an abrupt completion. Abrupt completion may be caused by the occurrence 
of an exceptional situation during execution, such as an attempt to divide an 
integer by 0; it can also be forced by the program by means of a throw or a 
return statement. For example, the execution of throw e;, where the expression 
e evaluates to some object o, throws an exception “with reason o” to be caught 
by the nearest dynamically-enclosing catch clause of a try statement (see [10, 
§11.3]). Similarly, the execution of return e; returns control, together with the 
value of e, to the nearest dynamically-enclosing activation frame. 

The interactions between these two mechanisms are described in [10, §14.15, 
§14.16, §14.18], to which we refer for more detail. The rules for exception han- 
dling are given in Table 9 and Table 11. Uncaught exceptions are not treated in 
the present paper. 

Some of the rules for the try statement include a finally clause written 
in square brackets, to be regarded as “optional:” the brackets indicate that 
the clause should be ignored if the statement at hand has no finally block. 
A similar convention is adopted for the return statements and results, where 
return [u] ; accounts for both cases where some value v is and is not returned 
(and similarly for the results). 

Table 10 contains a grammar of syntactic contexts which pop control out 
upon occurrence of an abrupt evaluation result, with no further ado. Contexts 
of the form ^? [_], called “pop-out” contexts, are used in the rule scheme [pop] 
to propagate abrupt evaluation results outwards through the structure of a pro- 
gram. All syntactic constructs which are not represented in a pop-out context 
respond to such results with some computational action described by a separate 
semantic rule. Examples of such constructs are the synchronized and the try 
statements. 
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[pop]i 


t)[q]^q 


[retl] 


ei — » 62 


return ei ; — ► return 62 ; 


[throwl] 


6i » 62 

throw 61 ; — ► throw 62 ; 


[tryl] 





[exit 4 ] (m, { throw{ 6 ) S }) — ► throw{ 6 ) 
[ret2] return [u] ; — >- return [t] 

[throw2] throw o ; — >- throw (o) 

try { } i? — - * 



[try2] 

[try3]2 



[try4]3 



[try5]3 



[try6]3 



try { return [t] S' } i? [finally { }] — ► return [t] 
bi — »- 62 

try bi H [finally b] — ► try 62 H [finally b] 
b — ► { throw (o) S } 

try b catch (r i) { S' }pH [finally b'] — >- 

try { throw{o) S } catch (r i) { S' }p[iH^o] H [finally b'] 

bi — «- 62 

try { throw {o) S } catch (r i) b\ H [finally b] — ► 
try { throw (o) S } catch (r i) 62 H [finally b] 

b — ► c 

try { throw (o) S } catch {t i) b H — ► c 



[try?]'^ 



b — ► { throw{o) S } 
try b catch (r i) b' — ► 
try { throw (o) S } catch (r i) b' 



try b Hi [finally bi] — >- try { throw{o) S } H2 [finally 62] 
try b catch (r i) b' Hi [finally 61] — >- 
try { throw {o) S } catch (r i) b' H2 [finally 62] 

try { throw {o) S } catch (r i) b [finally { }] — >- throw (o) 

try { throw {o) S} H [finally b] — >- c 
try { throw {o) S } catch {t i) b H [finally b] — ► c 

^ where [_[ is a “pop-out” context 
^ if 62 / { throw(o) S } 

® if o e r 
^ if o ^ r 

Table 9. Exceptions and return 



[try 8]4 

[try 9 ]'‘ 

[trylO]'^ 
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t?[_] ::= [_]./ = e | z=[_] | ^=[_] 

I op [_] I [_] bop e I V bop [_] 

|[_]./ I [.].m{E) I o.m(^[_]) 

|rz=[_]i?; I [_]; | {[_]5} 

I if ( [_] ) s I return [_] ; 

I throw [_] ; | synchronized( [_] ) 6 

I try {} H finally { [_] 5} 

I try { throw{o) S } finally { [_] S" } 

I try { throw{o) S } catch (r z) { [_] S" } i? finally { } if o G r 
I try { throw (o) S } catch (r z) { q'S' } H finally { [_] S'" } if o G r 
I try { return[v\ S} H finally { [_] S' } 

^[_] ::= [_]if|z;^[_] 

Table 10. “Pop-out” contexts 



[finl] 

[fin2] 

[fin3] 

[find] ^ 

[fin5] ^ 
[fin6] ^ 

[fin7] 

[finS] 

^ if o G r 



b\ — >- 62 

try {} H finally b\ — ► try {} H finally 62 
b — ► c 

try { } H finally b — ► c 
bi — >- 62 

try { return [v] S} H finally b\ — ► 
try { return [v] S} H finally 62 

bi — >- 62 

try { throw (o) S } catch (t i) { } H finally bi — ► 
try { throw (o) S } catch (t i) { } H finally 62 

b — >- c 

try { throw (o) S } catch {t i) { } H finally b — ► c 
bi — >- 62 

try { throw (o) S } catch (r z) { q S' } H finally b\ — »- 
try { throw{o) S } catch (r z) { g S' } H finally 62 

b — >- { throw{o) S } 

try b finally b' — >- try { throw (o) S } finally b' 
bi — >- 62 

try { throw (o) S } finally b\ — »- 
try { throw (o) S } finally 62 



Table 11. finally 
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(6*1, synchronized(p){p.y= 2; } ...) | (6*2, ^2), 
by [statseq, syn2, var] 

(6*1, synchronized (o) {p.y = 2; }...)! {^2, h), {{Lock, 9i, o)}, 
by [statseq, syn3, block2, statseq, expstatl, assignl, accessl, var] 

(6*1, synchronized (o) {o.y = 2; }...)! (^'2, ^2), {{Lock, 9i, o)}, 
by [statseq, syn3, block2, statseq, expstatl, assign3, lit] 

(6*1, synchronized (o) { o.y = 2; } ...) | (6*2, t2), {{Lock, 9i, o)}, 
by [statseq, syn3, block2, statseq, expstatl, assignS] 

(6*1, synchronized (o) {2; }...)[ (6*2, t2), {• • • < (^ssz^n, 6<i, o.y, 2)}, ^ 
by [statseq, syn3, block2, statseq expstat2] 

(6*1, synchronized (o) {* } ...) | (6*2, ^2), {• • • < (^ssz^n, 6*1, o.y, 2)}, ^ 
by [statseq, syn3, block2, *] 

(6*1, synchronized (o) {}...)[ (6*2, t2), {• • • < 6<i, o.y, 2)}, ^ 

by [store] 

(6*1, synchronized (o) {}...)] (6»2, ^2), {• • • < (b'tore, 6*1, o.y, 2)}, ^ 

by [write] 

(6*1, synchronized (o) {}...)] (6*2, t2), {• • • < ( kFrzte, 6<i, o.y, 2)}, ^[o.y 2] 

by [statseq, syn4, blockl] 

(6*1, * a = p.x; . . .) I (6*2, t2), {■ ■■ < {Unlock, 9i, o)}, 2] 

by M 

(6*1, a = p.x; . . .) I (6*2, 12), {■ ■ ■ < {Unlock, 9i, o)}, ^j,[o.y 2] 

(...) 

Figure 2. Run of Example 2.3 

3.9 Starting and Stopping Threads 

The notion of configuration introduced in Section 3.3 is extended here to include 
a set 0 of thread identifiers, whose elements identify threads which are bound 
to stop. We write 0 \ 9 for 0 U {9{ when we assume that 9 is not in 0. A 
configuration is now redefined to be a 4-tuple of the form: 



{T,0,rj,y). 
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All operational rules introduced so far have no interaction with the mechanism 
for stopping threads; in view of the conventions introduced in Section 3.3, by 
which parts of a configuration may be left implicit when not directly involved in 
the evaluation, the rules of the previous sections can be read with no editing in 
the new operational setting with 0. 

Table 12 presents the rules for the methods start () and stopO of the class 
Thread. The interplay of stopping threads and Java’s notification system is 
discussed in Section 3.10. 



[startl]^’^ 6*.start(); , 6 > — ►* | (6*,/rome(6*, run, ()); , <T0), 6> 
[start2]^’^ 6*.start(); , 0 — >- * | (0, *, a$),0 \ {6} 

[start3]^’^’® T\ (6*', 6 *.start(); ), /i — ►Tj {O' ^ throw {o)), 
[stopl] 0. StopO;, 6 > — ►*, 6 >U{ 6l} 

[stop2]® (0, 0,6* I 0, M ^ (^, throw{o)),0, 



^ if frame{6, start, ()) is undefined 

Mf 61 ^ © 

® if 0 G © or frame{6, run, ()) is undefined 
^ if T{6) is defined or 6 = 6' 

® where {o, fi') = new(lllegalThreadStateException, /i) 

® where t is a redex and (o, fi') = new(ThreadDeath, /i) 

Table 12. start () and stopO 



The rules [startl], [start2] and [start3] can only be applied if the method 
start 0 has not been overloaded, that is if /rame(0, start, ()) is undefined. 
Since stopO is declared as final in class Thread and thus cannot be redefined, 
no analogous side condition is required in the rules for stopO . The rule [startl] 
only applies if no thread with the same identifier as the one to be started is 
currently running; this is implicit in the use of “[’’. If such a thread identifier 
exists an IllegalThreadStateException is thrown by [start3j. 

If a thread 9 is started and frame{9, run, ()) is undefined, the built-in run 
method of the class Thread is invoked. The latter simply calls the run method 
of 6*’s run object, that is the runnable object given as argument to the expression 
that created 0 [10, §20.20], if such an object exists, and do nothing otherwise 
[10, §20.20.13]. Since, for simplicity, we only consider class instance creation 
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expressions with empty parameter list, and hence have no run objects associated 
with threads, 6 does nothing when started ii frame(6, run, ()) is undefined. This 
explains [start2]. This rule also captures the case of a thread which has been 
stopped before having ever been started (indeed possible in Java [10, §20.20.15]). 
If the thread is eventually started, it will immediately terminate and its name 
removed from 0. 

As a result of the invocation of a stop method of class Thread an asyn- 
chronous exception is thrown. Java allows a small but bounded amount of 
execution to occur between the method call and the actual throw of the excep- 
tion [10, §11.3.2]. We allow such execution to be arbitrarily long: at any time 
during execution a thread whose stop method has been invoked (by [stopl]) 
may decide that the time has come to throw a ThreadDeath exception. The 
exception is thrown by [stop2] as deep inside the structure of the program as is 
necessary to allow a catch by a possibly enclosing try-catch statement. This is 
ensured by the side condition that t is a redex. These are the terms of the form: 

Redex ::= i = v\ l = v\ null.f \ null./ = e | ^ | this | i \ new CO; \k 
I op u I bop V 2 I o.m (V) \ t i = v dD; \ t i = v; |t; |; 

I if s I { } I (’7^, { }) I throw (o) ; | return [v] ; | try { } 

I 6*. start 0; |6*.stop(); | o.wait () ; |o.notify(); 

As throw V and return [u] are not contained in this list of redices, a thread cannot 
stop as long as it is performing a transfer of control, i.e. performing pop-out rules. 

A more committed policy for stopping threads may be adopted either by 
requiring fairness on [stop2] or by enforcing such a condition by means of a 
counter binding the amount of execution steps allowed before this rule is applied. 

No rule removes threads from a configuration: when they finish execution, 
threads keep dwelling in an M-term together with the result that they produced. 



3.10 Wait and notification 

In Java every object has a “wait set.” A thread 9 who owns at least one, say n 
locks on an object o can add itself on that object’s wait set by invoking o.wait(). 
This thread would then lose all its locks on o and lie dormant until some other 
thread wakes it up by invoking o.notify(). Before resuming computation, 6 
must get its n locks back, possibly competing with other threads in the usual 
manner. When a thread goes to sleep in a wait set it is said to change its state 
from running to waiting. When notified, such a thread changes its state from 
waiting to notified, and finally from notified to running when it obtains its locks 
back. 

Let the letters R, W and N stand respectively for running, waiting and 
notified. The notion of M-term introduced in Section 3.3 is extended here by 
endowing each thread with a record of its state. The record of a running thread 
consists just of the identifier R. The record of a thread which is waiting or 
notified consists of a triple (A, o, n), where X is the identifier IT or A, o is the 
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object on whose wait set the thread is waiting and n is the number of locks that 
the thread acquired on that object. 



An M-term is now redefined to be a partial function mapping thread identi- 
fiers to triples (t, e, a), where t and a are as before and e is a state record. The 
notation T \ {6, t, e, a) extends that of Section 3.3 in the obvious way. When e is 
a triple (A, o, n) we write T \ (0, t, X, o, n, a) for T \ {9, t, (A, o, n), a) and omit 
the parts that are not immediately relevant as usual when no confusion arises. 



The operational rules introduced so far apply to M-terms of the new form by 
agreeing that, unless otherwise specified, evaluation applies to running threads 
(which can nevertheless change state when evaluated). More precisely: if the 
state record of a thread is omitted in the left hand side of a judgement, then it 
is understood to be R. For example, [expstatl] is now read 
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Ti I { 6 ,ei,R,ai),Oi,rii,fii — ► T2 \ (6*, 62, e, (J2), 02, ?72, M2 

Ti I (6*, ei ; , -R, cti), 0i, 771, ► T2 | {0, 62 ; , e, (J2), 02, M2, M2 

while [skip] is now read T \ {6,; ,R, ct), 0, 77, M — ^ T I (^, *, 0 , M, M- More- 

over, silent actions only apply to running threads; more precisely: the side con- 
dition in Table 3 changes now to “if T{9) = {t,R)J^ Finally, threads run when 
started, that is: the state of 9 in the right hand side of [startl] and [start2] is R. 



[waitl]^ 



( 6 *, o.wait(); ), 77 , /i — >- ( 0 , throw{o')), 77 , /i' 



[wait2]^ (0, o.wait (); , R),rj — ► (0, *, IF, o, n), 77 0 ( Unlock, 9, oY 



[notifyl]^ 



{9, o.notifyO; {9, throw{o')),V, m' 



[notify 2 ]^ 



{9, o.notifyO; ) | {t, W, o), 77 — ► {9, *) | {t, N, o), 77 



[notifyS]" 



T I o.notifyO; — ► T \ * 



[ready] 



{9, t, N, o, n), 77 — >- {9, t, R),rj 0 {Lock, 9, oY 



[stop3] 



{9,t,W),0\9^{9,t,N),0\9 



^ if locks{6, o,rj) = 0 and (o',Y) = ne«i)(lllegalMonitorException, /i) 
^ if locks{6, o,rj) = n > 0 
® if T{e) Y {W, o) for all 9 



Table 13. Wait sets and notification 



The rules for the notification system are given in Table 13. 

By the rules [waitl] and [notifyl], an appropriate exception is thrown if a 
thread attempts to operate on the wait set of an object on which it possesses no 
locks. The expression locks{9,o,rj) denotes the number of locks that a thread 
9 possesses on o in an event space 77 (the number of events {Lock, 9, o) with no 
matching Unlock). By the rule [wait2] a thread 9 can put itself in the wait set 
of an object o. This step involves the release by 9 of all its locks on o. Rule 
[notify2] notifies a thread waiting in the wait set of an object o. Such a thread, 
however, cannot run until all its locks on o are restored. This is done by [ready]. 
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Any notification on an object whose wait set is empty has no effect ([notify 3]). 
A waiting thread which has been stopped is woken up by [stop3]. 

Example. Figure 3 illustrates the interaction of the rules for wait and notification. 
Consider the M-term 

(0, synchronized(p) { if(c) p.wait(); },a) \ 

(0^, synchronized(p) { p.notify(); 

Let t = synchronized(o) { * } and t' = synchronized(p) { p.notify(); }, let 0 
be the empty event space and rj = {{Lock, 9,o) < ( Unlock, 6*, o)}; let a and ct' be 
stacks with (r(p) = cr'(p) = o and a{c) = true. The stacks, which do not change 
during execution, are omitted in the figure. 



4 Prescient Event Spaces 

The aim of this section is the formalization of the so-called “prescient stores” 
of [10, §17.8] in our event space semantics. The specification claims that the 
“prescient” semantics is conservative for “properly synchronized” programs. We 
also formalize the intuitive notion of “proper synchronization” and prove this 
claim. 

The prescient store actions are introduced in [10, §17.8, p. 408] as follows: 

“ . . . the store action [of variable V by thread T is allowed] to instead 
occur before the assign action, if the following restrictions are obeyed: 

— If the store action occurs, the assign is bound to occur. . . . 

— No lock action intervenes between the relocated store and the assign. 

— No load of V intervenes between the relocated store and the assign. 

— No other store of V intervenes between the relocated store and the assign. 

— The store action sends to the main memory the value that the assign 
action will put into the working memory of thread T. 

The last property inspires us to call such an early store action prescient: ...” 
This section is an improved and corrected version of [17]. 



4.1 Prescient Event Space Rules 

The specification of prescient stores [10, §17.8] seems to assume that it is known 
which Store events are prescient and which prescient Store event is matched by 
which Assign event (as if they would be e.g. re-arrangements of Store actions 
in the old sense). We do not assume such knowledge but adopt a more general 
approach introducing so-called labellings that allow us to use the “old” Store and 
Assign events as introduced in Section 2.1 with an additional “labelling” that 
states whether they are prescient or not. These labellings are not necessarily 
unique but it is always possible to infer a labelling at run time. It will turn 
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(0, synchronized(p) { if(c) p.wait(); },i?) | {O', t' ,R),% 

by [syn2, var] 

{0, synchronized(o) { if(c) p.wait(); },i?) | {O', t' ,R),{{Lock,0,o)} 
by [syn3, block2, statseq, ifl, var] 

{0, synchronized(o) { if(irMe) p.wait(); },i?) | {O', t' , R) , {{Lock , 0 , 6)} 
by [syn3, block2, statseq, if2] 

{0, synchronized(o) { p.wait(); },i?) | {O', t' ,R),{{Lock,0,o){ 
by [syn3, block2, statseq, expstatl, calll, var] 

{0, synchronized(o) { o.wait(); },i?) ] {O' , t' , R) , {{Lock , 0 , o){ 
by [syn3, block2, statseq, wait2] 

{O' t' , R) \ {0, syn’d(o) { * },W, o, 1), {{Lock, 0,o) < { Unlock, 0, o)} 

{O', synchronized(p) { p.notify(); }) ] {0,t,W,o,l),rj 

by [syn2, var] 

{O', synchronized(o) { p.notify(); },R) \ {0,t,W,o,l),r](B {Lock,0' ,o) 
by [syn3, block2, statseq, expstatl, calll, var] 

{O', synchronized(o) { o.notify(); },-R) ] {0,t,W,o,l),r](B {Lock,0' ,o) 
by [syn3, block2, statseq, notify2] 

{O', syn>d(o) { * },i?) ] (6», syn>d(o) { * }, IV, o, 1), r; 0 {Lock, O', o) 
by [syn3, block2, *] 

{O', syn>d(o) { },i?) | (6», syn>d(o) { * },N,o,l),rj® {Lock,0' ,o) 

by [syn4, blockl] 

{O' , *, R) \ {0, synchronized(o) { * }, TV, o, 1), r; 0 • • • 0 ( Unlock, O' , o) 

by [ready] 

{O' , *, R) \ {0, synchronized(o) { * },-R),? 70---0 {Lock, 0, o) 
by [syn3, block2, by *] 

{O' , *, R) 1 {0, synchronized(o) { }, i?), ?7 0 • • • 0 {Lock, 0, o) 

by [syn4, blockl] 

{O' , *, R) \ {0, *, R), W,r](B ■ ■ ■ (B { Unlock, 0, o) 

Figure 3. Interaction of waitO and notify () 
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out, however, that the semantics is independent of the choice of labellings, see 
Corollary 4.7. 

Prescient event spaces are defined, on the one hand, by a relaxation of the 
event space rules: All rules which forbid prescient stores are cancelled and used 
instead to define inductively a predicate that tells whether a Store event is 
necessarily prescient. But, on the other hand, we have to add some rules to 
ensure that a prescient Store corresponds to a relocated Store that obeyes the 
old event space rules. 

First, we define an abbreviation for the maximal event of type (A, 0,1), ir- 
relevant of its fourth component, occurring before some other event a, and thus 
write {A, 0, 1) <l a if 

{A, 0,1) < ah {{A, 0, ly <a^ {A, 0, 1)' < (A, 0, 1)) 

If we write, however, {Store, 0, 1, v) <l {Assign, 0, 1, v), i.e. both events are 
written with their values and those are identical, then we mean the maximal 
{Store, 0, 1, v) event with value v before {Assign, 0, 1, v). 

We define prescient ^^{Store, 0, 1) to be valid if one of the rules (P1-P7) below 
holds. The subscript rj is usually omitted if it is clear from the context. Note 
that <P yA 'P abbreviates ^{(P =h P) where we use the conventions of Section 2.2, 
i.e. ^{<P ^ <F) is short for ^Va .(<?=> 3b . S') where a and b are lists of events 
and a contains precisely all events occurring in <P except the bound {Store, 0, 1) 
event. 

{Store, 0,1)' < {Store, 0,1) {Store, 0,1)' < {Assign, 0,1) < {Store, 

{Store, 0, 1) yA {Assign, 0, 1) < {Store, 0, 1) 

{Assign, 0, 1, v') < {Store, 0, 1, v) yA 

{Assign, 0, 1, v') < {Assign, 0, 1)' < {Store, 0, 1, v) 

{Lock,0) < {Store, 0,1) yA {Lock,0) < {Assign, 0,1) < {Store, 0,1) 

{Store, 0, ly < {Store, 0, 1) A prescient {{Store, 0, 1)') yA 

{Store, 0,1)' < {Assign, 0,1) < {Assign, 0,1)' < {Store, 0,1) 

{Store, 0, 1, v) <l {Assign, 0, 1, v) <l {Load, 0,1) yA 
{Assign, 0, 1, v) < {Store, 0, 1, v)' < {Load, 0, 1) 

{Store, 0, 1, v) <l {Assign, 0, 1, v) <l {Unlock, 0) yA 

{Assign, 0, 1, v) < {Store, 0, 1, v)' < write _of {{Store, 0, 1, v)') 

< {Unlock, 0) A^prescient{{Store,0,l,v)') 

Rules (P1-P4) are the negations of (4), (6), (9), and (17), respectively, that 
forbid prescient Store events. Rule (P5) is sound because if there is only one 
{Assign, 0, 1, v) between two stores and the first is prescient, then by re-arranging 
the prescient Store two Store events would follow each other without a triggering 
Assign in between, which contradicts the old semantics. Rules (P6-P7) ensure 



0,1) (PI) 
(P2) 

(P3) 

(P4) 

(P5) 

(P6) 

(P7) 
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that in cases where old event space rules (3) and (15) are violated, still a relocated 
(i.e. prescient) Store exists which is responsible for storing the right value. So 
e.g. (P6) states that if any Store between the last Assign before a Load and the 
Load itself is necessarily prescient, then the last Store before the Assign must 
also be prescient and thus responsible for fulfilling old (3) when relocated. Note 
that it is sufficient to consider the last Assign before the Load (and the Unlock, 
respectively) . 

With respect to the other (old) event space laws, we keep rules (1), (2), (5), 
(7-8), (10-14), and (16). 

Rule (3) has to be adapted as follows, allowing prescient Stores on the right 
hand side of an implication: 

{Assign, 9, 1, v) <l {Load, 6, 1) 

( {Assign, 9, 1, v) < {Store, 9, 1, v) < {Load, 9,1) ) V (3’) 

( {Store, 9, 1, v) <l {Assign, 9, 1, v) <l {Load, 9, 1 ) ) 

and rule (15) analogously: 

{Assign, 9, 1, v) <l ( Unlock, 9) 

( {Assign, 9, 1, v) < {Store, 9, 1, v) < write _of {Store, 9, 1, v) 

< {Unlock, 9) /\ ^prescient{Store, 9,1)) V (15’) 

{{Store, 9, 1, v) <l {Assign, 9, 1, v) <l {Unlock, 9) A 
write _of {Store, 9, l,v) < { Unlock, 9 ) ) 

Both rules are used in cooperation with (P6-P7). Note that in the left branch 
of the the disjunction in the conclusion of (3’) it is unnecessary to stipulate 
-^prescient{Store,9,l) since this will follow from (NP3) and (18) that will be 
defined below. 

We can also infer which Store events are necessarily not prescient: We define 
the predicate nonjprescient{Store,9,l) on the given event space rj to be true if 
one of the rules (NP1-NP3) is fulfilled. 



Va G {{Lock), {Load, 1), {Store, ^)} . {Store, 9,l,v) < a ^ 

{Store, 9, 1, v) < {Assign, 9, l,v) < a 

{Store, 9, 1) < {Store, 9, 1)' A non jprescient {{Store, 9, 1)') ^ 
{Store, 9,1) < {Assign, 9,1) < {Assign, 9,1)' < {Store, 9,1)' 

( {Assign, 9, 1, v) <l ( Unlock, 9) A 

{Assign, 9, 1, v) < {Store, 9, l,v) < { Unlock, 9) ) ^ 

( {Assign, 9, 1, v) < {Store, 9, 1, v)' < write _of {{Store, 9, 1, v)') 
< ( Unlock, 9) A ^prescient{{Store, 9, 1, v)') ) V 
( {Store, 9, 1, v)" <l {Assign, 9, 1, v) <l ( Unlock, 9) A 
-^non jprescient {{Store, 9, 1, v )") ) 



(NPl) 

(NP2) 



(NP3) 
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Rule (NPl) corresponds to the second, third, and fourth requirement in [10, 
§17.8] (see top of Section 4), (NP2) to (P5), and (NP3) to (P7). If Assign <l 
U nlock, such that the Assign is the last one before the Unlock, then (NP3) says 
that if all Stores in between are prescient but one, then this one is necessarily 
nonjprescient if the following holds: There is no matching Store before the 
Assign or the last such is nonjprescient . This is a sound rule, because if the Store 
of discourse were not non-prescient, then one might choose it to be prescient, 
but then no last matching Store would occur before the Assign that could be 
chosen prescient. In such a case the Assign would not have been stored before 
the Unlock — not even by a prescient store — and hence the old semantics is not 
preserved. 

Notice that the predicate prescient propagates from past to present with the 
exception of (P6-P7) which in some case needs to look back to the last non- 
prescient Store, whereas non-prescient is computed in the opposite direction. 
Also observe that ^nonjprescient{s) is not equivalent to prescient(s) for a Store 
event s and hence also prescient(s) V non _pres dent (s) does not always hold. 

Finally, we add the new rule 

{Store, 9, 1) ^{prescient{Store, 9, 1) A non_prescient{Store , 9, 1) ) (18) 

according to the specification of prescient Store events. This rule in cooperation 
with (NP1-NP3) prohibits that prescient Stores occur at places ruled out by the 
specification. 

Summing up, a prescient event space is a poset of events every chain of which 
can be counted monotonically and satisfying conditions (1), (2), (3’), (5), (7-8), 
(10-14), (15’), (16), and (18). 

The non-deterministic operation 0 of Section 2.4 also works for prescient 
event spaces (the only difference being that it defines a predicate on event spaces 
that are prescient). 

An event space is called complete if for all Read and Store events corre- 
sponding Load and Write events exist (all load_of and write_of functions are 
total; see the discussion at the end of Section 2.2). A prescient event space rj is 
called complete if additionally for any necessarily prescient {Store, 9, 1, v) there 
is a subsequent {Assign, 9, 1, v). Note that it makes sense only for the final event 
space of a reduction sequence to be complete. During execution, the matching 
Assign for a prescient Store might not have happened. A complete prescient 
event space fulfills the first and last requirement in [10, §17.8] (see top of Sec- 
tion 4) . A prescient event space U is called completable if there is a sequence of 
events a such that T 0 a is complete. 

4.2 Labellings 

According to the definitions above even for complete prescient event spaces there 
might be a Store event s in a given event space for which neither prescient{s) 
nor nonjprescient{s) is derivable. We define so-called labellings which allow to 
choose to a certain extent which Store shall be considered prescient and which 
not. 
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For a complete prescient event space rj a labelling is a predicate i on Stores 
that obeys rules (L1-L4) below together with a corresponding matching function 

passign_oflj 0 i : {{Store, 9,1) G ij \ t{Store,9,l){ ^ rj{ Assign, 9,1) 

that fulfills the axioms (M1-M5). Note that rule (Ml) ensures that passign_of 
is total. 

prescient{s) £{s) 
non _pres dent {s) ^^(s) 

(P5) [£/prescient] ^ £{Store, 9, 1, v) 

(NP3) [£/ prescient, ~^£ / non jpres dent] ^ ~^£{Store, 9, 1, v) 

{Store, 9, 1, v) A £{Store, 9, 1, v) 

{Store, 9, 1, v) < passign_of^ {Store, 9, 1, v) = {Assign, 9, 1, v) 

Va € {{Lock), {Load, 1), {Store, ^)} . {Store, 9,1) < a A £{Store, 9, 1) 
passign_of^ {Store, 9,1) < a 

passign_of^ {Store, 9, 1) < {Store, 9, 1)' A ~^£{{Store, 9, 1)') 
passign_of^ {Store, 9,1) < {Assign, 9,1)' < {Store, 9,1)' 

( {Store, 9, 1, v) <l {Assign, 9, 1, v) <l {Load, 9, 1) A £{Store, 9, 1, v) ^ 

{Assign, 9, 1, v) < {Store, 9, 1, v)' < {Load, 9, 1) ) 

^ passign_of^ {Store, 9, 1, v) = {Assign, 9, 1, v) 

( {Store, 9, 1, v) <l {Assign, 9, 1, v) <l {Unlock, 9) A £{Store , 9 , 1 , v) ^ 

{Assign, 9, 1, v) < {Store, 9, 1, v)' < write _of {{Store, 9, 1, v)') 

< {Unlock, 9) A ^£{{Store,9,l,v)') ) 

passign_of^ {Store, 9, 1, v) = {Assign, 9, 1, v) 

In rule (L3) we use “(P5) [£/ prescient]’’^ to abbreviate the axiom (P5) where 
prescient is syntactically replaced by £ and the (bound) event {Store, 9, 1, v) of 
(P5) coincides with the one in the conclusion of (L3). The analogous convention 
applies for (L4). Rule (L3) is necessary to propagate £ (as prescient) according 
to (P5), and rule (L4) to propagate ~^£ (as non_presdent) according to (NP3). 
Observe that one does not need similar rules in order to propagate (P7) and 
(NP2), since those are already covered by (the contraposition of) rules (L4) and 
(L3), respectively. 

By rule (M3) one can never choose an Assign event as matching when its re- 
arrangement would lead to a situation forbidden by the old event space rule (4) , 
i.e. where two Store events would follow each other. Rules (M4) and (M5) fix the 
matching for the prescient Store in situations where rules (3’) and (15’) apply 
but only the right disjunct in their conclusion is fulfilled. Note that for (M4-M5) 



(LI) 

(L2) 

(L3) 

(L4) 

(Ml) 

(M2) 

(M3) 

(M4) 

(M5) 
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the nested implication 

{<P ^ ^ passign_of^ {Store, 9, 1, v) = {Assign, 9, 1, v) 

is read with the usual conventions for ^ but {Store, 9, 1, v) and {Assign, 9, 1, v) 
are obviously universally quantified outermost. 

Lemma 4.1. For any complete prescient event space one can give a labelling. 

Proof. We choose £ := prescient and show that it fulfills the labelling rules: (LI) 
holds by definition of £, (L2) follows from (18), (L3) follows from (P5), and (L4) 
can be shown by contradiction employing (15’), (P7), and (18). 

For any {Store, 9,1) in the event space with £{Store,9,l) there is a fol- 
lowing matching {Assign, 9,1) event as the event space of discourse is com- 
plete. So for passign_of^ we can choose the function which maps any labelled 
{Store, 1) to the last following matching Assign before the first following event 
a G { {Load, 1), {Lock), ( Unlock), {Store, 1) }, unless a = {Store, 1) and ~^£{Store, 1) 
when the last but one such Assign is chosen which exists by (NP2). Then 
passign_of^ is a matching function by definition. 

4.3 Prescient Operational Semantics 

We obtain the prescient operational semantics from the old semantics of Section 3 
just by switching from the event spaces of Section 2 to the prescient event spaces 
of Section 4 keeping the operational rules untouched. 

For the prescient operational semantics we write — o. Moreover, let Conf^ 
denote the set of configurations with prescient event spaces, and Conf^ those 
according to the definition of — ► of Section 3. 

Lemma 4.2. Any event space rj (obeying the old rules) is also a prescient event 
space, thus any old configuration is a new configuration, i.e., Confi C Confi, 
and any reduction F — >- F' is also a prescient one, i.e. F — o F' holds as well. 

Proof. Assume rj is an event space satisfying the old rules. By a simple induc- 
tion, prescient{s) never holds for any Store event s in rj. Thus 77 is a prescient 
event space because the new rules form a subset of the old rules. Since the con- 
figurations only differ in the event space definition and the rules of the semantics 
are not changed at all, the other claims of the lemma now hold trivially. 

Since we use labellings our operational semantics is very liberal. It accepts 
reductions using Store events even if it is not clear during execution whether this 
Store event is meant to be prescient or not. In such a case, however, the prescient 
Store is not done as early as possible. Therefore, in practical cases, any Store 
which is not recognized by the rules (P1-P7) can be considered non_prescient. 
This corresponds to choose the labelling to be simply prescient (cf. Lemma 4.1). 
As a consequence, the labelling can be computed at run time. Due to (P6- 
P7), however, it is not always possible to detect immediately whether a Store is 
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prescient, sometimes one has to wait for a Load- or Lock event to happen. Also 
the matching can be computed at run-time with a little lookahead, cf. (M4-M5). 

By the proof of Lemma 4.1, however, labellings only exist for complete pre- 
scient event spaces, hence, in the rest of the paper, any prescient event space F 
is supposed to be completable. Any completion of F has a labelling and though 
its restriction to F does not necessarily give a labelling, because (Ml) obviously 
need not be valid, it is easily checked that all the other rules for labellings still 
hold. Thus for any completable prescient event space there exists a “partial” 
labelling, which fulfills only (L1-L4) and (M2-M5). Therefore we can assume 
that any completable prescient event space is endowed with a fixed (partial) 
labelling i that, for the sake of simplicity, will be exhibited in form of special 
action names: pStore and pAssign. If (.{Store, 9, 1, v) holds then {Store, 6, 1, v) 
is denoted {pStore, 9, 1, v) and analogously for the corresponding Assign we use 
p Assign. This notation contains implicitly all information given by the matching 
function, since by monotonicity of passign_of for every {pStore, 9, 1, v) the first 
subsequent {pAssign, 9, 1, v) must be the matching one. 



4.4 Prescient Semantics is Conservative 

The relation between the “normal” and the “prescient” semantics is described 
in [10, §17.8, p. 408] as follows: 

“The purpose of this relaxation is to allow optimizing Java compilers to 
perform certain kinds of code rearrangements that preserve the semantics of 
properly synchronized programs but might be caught in the act of performing 
memory actions out of order by programs that are not properly synchronized.” 

This has to be formalized in the sequel. The following notation, exemplified 
for — ► only, will be used analogously for all kinds of arrows: denotes a 

one-step reduction with rule r; if e = (ri, . . . , r„) is a list of rules then — ^ 

denotes ; if the list is irrelevant we write — ►* . For rules that change 

the event space we often decorate arrows with actions instead of rule names as 
the latter are ambiguous. 

First, we observe that — 1> and — ► can not be bisimilar by definition since 
— 0 permits (prescient) Store-actions where — ► does not. But — > cannot even 
be bisimilar to the reflexive closure of — ►, since simulating a {pStore, 9,1) and 
the following Writes by void steps leads to inequivalent configurations (since the 
main memories will contain different values for 1). 

As a prerequisite for a simulation relation of type Conf^ x Conf^, we define 
an equivalence on prescient configurations ~ C Conf^^ x Conf^^ as follows: 

{T, r,, p) ~ {T', r,', p') ^ T = T' A {T, r,, a, p) | {T' , r,', a', p') 

{T, V, p) i {T', i, p') Va . ?7 0 a], ? 7 ' 0 a], A 

Ve. {T,rj,p) -V (Ti,? 7 i,^i) A {T' ,p',p') -V (T2,?72,M2) ^ Mi = M 2 
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where a is any sequence of actions, e is a sequence of rules and (T, 77, 

(r', 77', 77') if (r, 77, 77) — 1>* (T', 77', 77') such that 77' is complete. (For the sake of 
simplicity we do not consider the extended configurations of Section 3.10.) 

This equivalence relation is obviously preserved by the rules of the semantics: 

Lemma 4.3. The relation ~ is an equivalence relation such that if A ~ A 
then A r[ zj(f A T 2 for any rule r, and if such a reduction r exists then 
r[ ~ ^2 holds. 

In order to establish a bisimulation result, we must delay all the operations 
which are possible due to a {pStore, 6, 1, v) until the matching pAssign event. 

But that will not work for all kinds of programs. Consider the following 
example: 



(0, { synchronized(p) { p.x = 1; } }, ct) | (6»', { p.x = 2; }, a') 

with <t(p) = cr'(p) = o and I = o.x. Its execution may give rise to a sequence of 
computation steps which contains the following complete subsequence of actions: 

{Lock, 9, o), {Assign, 9, 1, 1), {Store, 9, 1, 1), {pStore, 9' , I, 2), 

( Write, 9', 1,2), { Write, 9, 1), ( Unlock, 9, o), {pAssign, 9', I, 2 ) 

In a simulation the {pStore, 9', I, 2) is illegal w.r.t. to the old event space definition 
and can only be simulated by a void (i.e. delaying) step as well as the following 
Write. Now the ( Write, 9, 1, 1) and the corresponding {Store, 9, 1, 1) are bound to 
occur before the Unlock. Finally, after the pAssign we must recover the pending 
prescient {Store, 9' ,1,2) and its corresponding ( Write, 9' ,1,2). According to this 
simulation, I has value 2 in the global memory but the reduction via — !> yields 
1 for 1. Thus, both end-configurations are not equivalent. 

Consequently, we have to restrict ourselves to “properly synchronized” pro- 
grams. A multi-threaded program T is called properly synchronized if any (pre- 
scient) event space in any possible configuration occurring during reduction ful- 
fills the following axiom: 

{Assign, 9, 1), {Assign, 9', 1) 

{Assign, 9,1) < {Unlock, 9, o) < {Lock, 9', o) < {Assign, 9' ,1) 

where the Assigns may correspond to prescient Store actions. Analogously, 
an event space is called properly synchronized if it fulfills (19). A sufficient 
condition for “properly synchronizedness” is obviously the syntactic criterion 
that in a program shared variables may only be assigned in synchronized blocks. 

Proper synchronization guarantees that between a prescient Store event and 
its corresponding pAssign event no other thread can change the main memory: 

Lemma 4.4. Let U be a properly synchronized complete prescient event space. 
If 9 ^ 9' the following holds: 



{pStore, 9,1) < { Write, 9' ,1) => passign_of {pStore, 9,1) < { Write, 9' , 1) 
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Proof. Let (pStore,9,l) < {Write, 9' ,1) with 9^9' and let {pAssign,9,l) = 
passign_of{p Store, 9, 1). 

First, assume that store_of {Write,9' ,1) = {Store, 9' ,1) < {Write, 9' ,1), for 
a non-prescient {Store, 9' ,1) such that we have {Assign, 9' ,1) < {Store, 9' ,1) 
by the negation of (P2). There is a maximal non-prescient {Assign, 9' ,1) <l 
{S tore, 9' ,1) such that by (P3) the fourth (value-)components of {Assign, 9' ,1) 
and {Store, 9' , 1) are equal. Moreover, by (M3) no {pAssign, 9' , 1) whatsoever can 
occur between those two. If now 

{pAssign,9,l) < {Unlock, 9, o) < {Lock, 9', o) < {Assign, 9' ,1) 

we obviously have {pAssign, 9,1) < { Write, 9' , 1). Otherwise, from properly syn- 
chronization, i.e. (19), it follows 

{Assign, 9' ,1) < {Unlock, 9' ,o) < {Lock, 9, o) < {pAssign, 9,1) (*) 

for a suitable ( Unlock, 9')-{Lock, 9) pair. We show that even 

{Assign, 9' ,1) < {Store, 9' ,1) < write_of{Store,9',l) < {Unlock , 9' , o) (**) 

which proves the lemma since, by the negation of (NP3), we also have 

{Lock, 9, o) < {pStore,9,l) < {pAssign, 9,1) 

which together with (*) leads to a contradiction to our assumption that 
{pStore, 9,1) < { Write, 9' ,1). 

In order to prove (**), first note that 

{Store, 9', 1) < {Unlock, 9') => 

{Store, 9' ,1) < write_of{Store,9',l) < {Unlock, 9') 

holds in arbitrary prescient event spaces. To see this, it is sufficient to con- 
sider the maximal {Store, 9' , 1) <l ( Unlock, 9') by monotonicity of write_of. By 
(P7) and (M4) it is then impossible that there is also another {Assign, 9' ,1) 
or {pAssign, 9' ,1) after {Store, 9' ,1). There is a maximal {Assign, 9' ,1) <l 
{S tore, 9', 1). Between those two events no {pAssign, 9' , 1) can occur due to (M3), 
hence (15’) is applicable and we are done. 

For a proof of (**) by contradiction, assume that 

{Assign, 9', 1) < {Unlock, 9' , o) < {Store, 9', 1) 

such that {Assign, 9' ,1) <l {Unlock , 9' , o) follows. Then by (15’) we have 

{Assign, 9' ,1) < {Store, 9', 1)' < write _of {{Store , 9' , 1)' ) < {Unlock, 9') 



since if we only had 

{p Store, 9', I, v) <l {Assign, 9', I, v) < {Unlock, 9' , o) < {Store, 9', 1) 
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the matching rule (M5) would be violated. By (PI), however, there must exist 
a {pAssign, O' , 1) event such that 

{Store, O', ly < {pAssign,0' ,1) < {Store , 0' , 1) . 

which contradicts the assumed maximality of {Assign, O' ,1). 

The second case that store_of {Write, O' ,1) = {pStore , O' , 1) < {Write, O' ,1) is 
treated analogously. 

In the rest of this subsection we formalize the already sketched simulation 
idea. To that end, in the sequel A (possibly with annotations) stands for con- 
figurations in Conf^ and F for new configurations in Conf^. Recall that any 
old configuration is also a valid one in the new sense by Lemma 4.2. Ac- 
cording to the observations above, we define a new reduction relation >-► : 
{Conf^ X E*) X {Conf^ x E*) where E = {{pStore), {Write), {Read)} by the 
rules of (reds)-(redd) below. Note that we do not need to treat {Load) events 
(cf. rule (NP3)). The corresponding [>-^-configurations {A,e) consist of an old 
configuration A G Conf^ plus a list of “pending” events e. Appending an event 
a at the end of a list e is written e o a. An additional operation split g ; (e) is 
needed. Given a list of events e it yields a pair of lists (e/,e') where both are 
sublists of e; the sublist e/ is obtained from e by extracting all {pStore, 0,1), 
( Write, 0, 1) and {Read, O' , 1) events and simultaneously changing a {pStore, 0, 1) 
into {Store, 0,1); e' is e/’s complement w.r.t. e. 

{A, e) \> >- {A, e o {pStore, 0, 1, u)) 

{A,e) {A, eo {Write, 0,1, v)) if {pStore, 0, 1, v) G e f\ 

write _of {pStore , 0, l,v) = { Write, 0, 1, v) 

{A,e) {A,eo {Read,0' ,l,v)) if {Write, 0, 1) G e 

{A,e) {A',e') if splitg i{e) = (e/,e') A 

^ {Assign, 0,l,v) ^ ei^ ^ 

(Z\, e) i>^ (2\', e) for any other case r if A A' (redd) 

To relate configurations of — 1> and (>-*- reductions the simulation relation 
« C Conf^ X {Conf^ x E*) is defined as follows: 

r « {A, e) if, and only if, {A, e)J, A A — ^ E^ A E^ ~ F 

where 

(A,e)J, if, and only if, 3A' . {A' ,e) !>-►* (A,e) 

i.e. E is equivalent to {A, e) if e is obtained correctly by means of >-«- and E is 
equivalent to the completion of A, usually called T/i, by executing the pending 



(reds) 

(redw) 

(redr) 

(reda) 
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events in e. Note that — > is used here for the sequence of events e, as e may 
contain prescient Store events. 

Below we use the following notation of a commuting diagram 

r — ► A 



A — - A 

stating that F — >- A — ^ A and F — >- A — ^ A and A ~ A- This notation is 
also used for any other kind of arrows. 



Lemma 4.5. If F ^ and F F' , where r is as in case (redd) and F 

stems from a properly synchronized program, then A — ► A' and the diagram 



A 




F 



\r r\ 




r 

V 



F' 



commutes; thus F « (A,e) cA- (A',e) « F' holds. 



Proof, (sketched) First note that if the left square commutes, then the whole 
diagram commutes by Lemma 4.3. 

Next, observe that r can be executed also before e. For a proof of this 
check that r does not depend on e by inspecting the relevant laws for event 
spaces: Rules (5), (16) refer to in-between-events which are not possible in 
e G E*. Rules (10) and (3’) are impossible since corresponding Loads are ruled 
out by (NPl) and (18). Rule (11) is not relevant as matching Writes are treated 
in (redw). Thus, we are left with (15’). Suppose r = {Unlock, 6 ) and that 
{pAssign, 6, l,v) <r is ensured via rule (15’) by a preceding Store only (i.e. the 
right branch of the disjunction in (15’) holds exclusively), then the last of those 
preceding {Store, 6, 1, v) events is prescient, i.e. £{Store , 6 , 1 , v) holds by (P6). 
Therefore, {pAssign, 9, 1, v) = passign_of^ {Store, 9, 1, v) by (M4) such that e can 
not contain the Store anymore as it is obtained via 

To prove that the diagram commutes it suffices, by definition of ~, to show 
that the same actions are executed, but maybe in different order. We have to 
ensure that Write events of the same variable from different threads are not 
re-ordered. Consider some ( Write, 9, 1) of e. By Lemma 4.4 Write events of a 
different thread 9' can not occur in the completion of A, so neither in F^ and 
hence neither in e. But e can also never contain two {Write, 9,1) events, since 
the first would be the matching Write event for the starting pStore; the second 
Write event’s matching Store (maybe prescient) would have to intervene between 
the starting pStore and its corresponding pAssign event by the monotonicity of 
store _of, thus contradicting (M2). 
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Theorem 4.6. For properly synchronized programs the relation ^ is a simula- 
tion relation of — !> and i.e. if F — 1 > F' during the execution of such a 
program and F « (Z\,e) then there is a (A',e') such that (A,e) i>^ (Z\',e') and 
F'^{A',e'). 

Proof. Assume F « (A,e), i.e. A — ^ F^ ~ F. We do a case analysis for 
F F': 

Case r = ( Write): If {pStore, 9,1) & e then it holds that {A, e) {A, eor) by 
(redw). Moreover, by Lemma 4.3, F' « {A, e o r). 

If {pStore,9, 1) ^ e then by Lemma 4.5, {A, e) (A', e') and F' « (A', e). 
Case r = {p Assign): Let split g ;(e) = (ej, e'). Since an Assign is always possible, 

. { Assign, 6 * -^-r . . i 

assume that A ► Ai. Now every action in ei becomes iegai tor the old 

semantics, so we can further assume Z\i A', such that (A,e) (Z\',e'). 

One can prove analogously to Lemma 4.5 that the left rectangle in 



Z\ 




{Assign, 6, 1, n)| 



r 



A' 




F 



r 



F' 



commutes; the right rectangle commutes by Lemma 4.3, thus {A, e) i>^ {A', e') 
and F' « (A', e'). 

For pStore and Read one proceeds as for Write, all other cases follow from 
Lemma 4.5. 



The main result of Section 4 is the following corollary which states that the 
prescient semantics is conservative, i.e. any prescient execution sequence of a 
properly synchronized program can be simulated by a “normal” execution of 
Java. 



Corollary 4.7. Civen F G Conf,^ from a properly synchronized program and 
A G Conf^ , if F ^ A and F — o* F' such that the event space rjr’ of F' is 
complete, then for any labelling of rjr' there is a reduction sequence A — ►* A' 
such that T' ~ A' . 

Moreover, if two different labellings yield two different reduction sequences 
A — ►* A{ and A — ►* A' 2 , then still A[ ~ A '2 holds. 

Proof. First, observe that if F ~ Z\ then F « {A,e). By a simple induction 
on the length of the derivation by Theorem 4.6, we get {A,e) >-►* (A',e) and 
F' « {A' ,e). Now e = e follows from the fact that F' is complete which entails 
that all prescient stores are matched by an Assign such that e must be empty in 
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the end. From e = e we immediately get F' ~ A'. Also from (A,e) (A',e) 

we can strip off a derivation A — ►* A' by definition of !>-►. 

The second claim follows just by transitivity of ~ as A[ ~ T' ~ 

5 Conclusions and Future Work 

In this paper we presented a structural operational semantics of concurrent Java 
and showed its flexibility by proving a non-trivial result relating two memory 
implementations. Our semantics covers a substantial part of the dynamic be- 
haviour of the language, and we expect it to combine easily with the type system 
developed in [8]. A further ambitious step is to include in the semantics prac- 
tical features like input/output, garbage collection, distributed applications via 
sockets or remote method invocation, and applets. 

Event spaces are not necessarily “complete,” that is, no matching Load must 
necessarily occur after a Read action or Write after a Store. In fact, there are 
well-formed event spaces which are not completable, and this complicates the 
metatheory of the semantics. However, it is conceivable that completability may 
be axiomatized by means of “local” conditions such as the rules of Section 2.2. 

It might also be worthwhile to study stronger notions of “proper synchro- 
nization” (for example, by taking into account Use actions). This might simplify 
the simulation of prescient semantics and allow a synchronous treatment of Read 
and Load. 

The proofs of semantical properties (like Lemma 4.4 or Theorem 4.6) are 
combinatorial in nature; this is a typical situation where proof checkers or auto- 
mated theorem provers can be usefully employed. 

Finally, we intend to investigate operationally based notions of program 
equivalence, which may serve as foundations for program logics. Abadi and 
Leino [2] have provided an axiomatic semantics, in Hoare style, for one of the 
(sequential) object calculi of [1] and proved that the logic is sound with respect 
to the operational semantics of the object calculus in use. The development of 
such a logic for a real concurrent object-oriented language like Java remains a 
challenge. 

Acknowledgements. We thank Doug Lea for useful comments and some inspira- 
tion regarding future work. 
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A Syntax 



Statement | Block \ StatementExpression ; 

I synchronized ( iJajpresszon ) Block 
I throw Expression ; \ TryStatement 
I return Expression ; 

I IfThenStatement 
Block ::= { BlockStatements opt } 

BlockStatements ::= BlockStatement \ BlockStatements BlockStatement 
BlockStatement ::= LocalVariableDeclaration ; \ Statement 
LocalVariahleDeclaration ::= Type VariableDeclarators 
ReturnType ::= Type \ void 

Type ::= PrimitiveType \ Class Type 
PrimitiveType boolean | int | . . . 

Class Type ::= Identifier 
VariableDeclarators ::= VariableDeclarator 

I VariableDeclarators , VariableDeclarator 
VariableDeclarator ::= Identifier = Expression 
Expression ::= AssignmentExpression 
AssignmentExpression ::= Assignment \ BinaryExpression 

Assignment ::= LeftHandSide = AssignmentExpression 
LeftHandSide ::= Name \ EieldAccess 

Name ::= Identifier \ Name . Identifier 
EieldAccess ::= Primary . Identifier 

Primary ::= Literal \ this | EieldAccess \ ( Expression) 

I Classinstance CreationExpression 
I Methodinvocation 

ClassInstanceCreationExpression new Class Type ( ) 

Methodinvocation ::= Primary . Identifier (. ArgumentList opt ) 
ArgumentList ::= Expression \ ArgumentList , Expression 
BinaryExpression ::= Unary Expression 

I BinaryExpression Binary Operator 
Unary Expression 

UnaryExpression ::= UnaryOperator UnaryExpression 
I Primary \ Name 

StatementExpression ::= Assignment \ ClassInstanceCreationExpression 

I Methodinvocation 
TryStatement ::= try Block Catches 

I try Block Catches opt finally Block 
Catches ::= CatchClause \ CatchClauses CatchClause 
CatchClause ::= catch( Type Identifier) Block 

IfThenStatement ::= if ( Expression ) Statement 
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Abstract. This chapter presents a dynamic denotational semantics of 
the Java programming language. This semantics covers almost the full 
range of the base language, excluding only concurrency and the API’s. 
A discussion of these limitations is provided in the final section of the 
chapter. 



The abstract syntax described in Chapter 1 tells us how to construct a gram- 
matically correct program. Every syntactically correct program describes an en- 
vironment that provides all the information about what to do during program 
execution. The semantics presented in this chapter, formalizes the definition of 
Java program behavior as defined in the Java Language Specification (JLS) Q. 
We describe the Java environment in Section H Each executing program is as- 
sociated with a store that is a repository for all instance values during program 
execution. The Java store is described in Section | Executing a Java program 
begins with executing the command in the static method “main” in the given 
class definition. Therefore, the result of a program depends on the semantics 
of commands and the expressions in the commands. We shall introduce a de- 
notational semantics of these commands and expressions in Sections | and ^ 
Throughout these semantics, we concurrently define two sets of semantics, a 
dynamic and a static semantics, to respectively represent the execution and def- 
initional denotations of the programs. 



1 Environment 

An environment is the information center for the execution engine and is at the 
heart of these semantics. Our environment is a semantic domain that has two 
components, the dynamic and static semantics. The dynamic aspect of the envi- 
ronment contains the traditional environmental information related to variables, 
their types and locations in the store (as in Stoy’s classical book on denotational 
semantics Q. It also contains control flow information for exceptions and breaks. 
The static aspect of the environment contains information related to all of the 
classes used by the program. This information includes the class members, types, 
initialization functions, super class and implemented interfaces. The static part 
of the environment is determined by evaluating the input files and then is used 
as an input parameter to the denotation of the main method of the invoked class. 

Jim Alves-Foss (Ed.): Formal Syntax and Semantics of Java, LNCS 1523, pp. 201-^^^ 1999. 

(c) Springer-Verlag Berlin Heidelberg 1999 
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In addition to information related to classes, their members and local vari- 
ables; the environment contains a number of auxiliary variables (all starting with 
the symbol &. These variables are used to record nesting and scoping information 
as well as flow control information. 

— Szpackage - specifies the fully qualified name of the package currently being 
defined. 

— Szcurrentint - specifies the fully qualified name of the interface currently 
being defined. 

— hcurrentClass - specifies the fully qualified name of the class currently 
being defined. 

— EzMods - provides a list of modifiers used in the current declaration. 

— EzType - defines the type used in the current declaration. 

— SzvarType - defines if this is a “Field” or “Local” variable declaration. 

— SzswitchExpr - value of the expression of the current Switch statement. 

— EzcaseF ound - boolean variable indicating if the case matching the switch 
expression has been found. 

— Side fault Found - boolean variable indicating if the default switch case had 
been found. 

— SzcaseC ont - command continuation for execution of the appropriate switch 
statement. This is needed due to the fact that the default case may be defined 
anywhere in the switch statement. 

— Szbreak - continuation information. 

— Sireturn - specifies the command continuation to execute upon return. 

— SzreturnVal, &zreturnType - specify the return type and value for a call. 

— k:. super - specifies the name of the current executing classes super class 

— &^throw - specifies the command continuation to be executed upon a throw 
command. 

— hthrown - specifies the value, type pair referring the thrown object (excep- 
tion). 

— EzthisObject - specifies a reference to the current object in which execution 
is occurring. 

— SzthisClass - specifies a reference to the class of the current object in which 
execution is occurring. 

To simplify the semantic presentation, we include within the environment a 
collection of methods (or auxiliary functions). For these functions, we use method 
invocation notation for these functions, where j.m(pi . . .pn) denotes invocation 
of auxiliary function m, with parameters {pi . . -Pn), invoked in the context of the 
current environment, 7. (Note that variable names will be referenced in the usual 
way with 7 [name] referring to the current value of name in the environment, 
and 7 [name <— v], denoting the new environment with name know returning 
the value v. The functions related to the dynamic semantics (execution) of a 
program are: 

— assnCompatible(r, ti). This boolean function returns true if a value of type 
T can be assigned to a variable of type ri, according to the rules of the JLS 
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— classLoader(name, store). In Java, whenever a new instance of a class is cre- 
ated, we invoke the class loader. In the runtime system this involves first 
determining if the class is already loaded, otherwise finding it from some 
source location, loading the bytecode, and then instantiating the class con- 
stant variables and executing any static class initializer. This function rep- 
resents this complex operation, and may result in modification to both the 
environment and the store. 

— condTypeOf(Tt,Vt,Tf,Vf). This function returns the type of conditional ex- 
pressions as defined by the JLS y. A full definition of this function appears 
following the specification of condition expressions in the semantics that 
follow. 

— getArrayElem(a, ind). This function returns the location of array element 
ind from the array referenced by a. 

— getArrayElemType(a). This function returns the type of array elements in 
the array referenced by array a. 

— getArray Ref (name). This function returns the reference for the array name. 

— g etc omCont (term). This function retrieves the command condition from the 
environment auxiliary variable denoted by term, where this term can be 
kbreak or ^continue. Continuations are discussed in detail in section^3 

— getMethod(name, signature). This function returns the denotation of the 
named method (of the specified signature). Specifically the value returned 
is a semantic function that takes a set of arguments as parameters and re- 
turns a command function (a function that takes an environment, command 
continuation and a store and returns an answer) . . All appropriate searching 
of the nested class and interface definitions is conducted, in accordance with 
the JLS 

— getValue(term). This function returns the value of the auxiliary variable 
referred to by term. 

The auxiliary functions used to build the static (declaration) portion of the 
environment are: 

— addConstr(mods, defn, throws, body). This function is used to add the con- 
structor specification to the environment for the current class. 

— addEield(name, initExp). This function is used to add the specified field and 
initialization expression to the environment given the current type and class 
scope. 

— addLoeal(name, initExp). This function is used to add the specified local 
variable and initialization expression to the environment, given the current 
type and class scope. 

— addMethod(mod, hdrinfo, throws, body). This function is used to add the 
method specification to the environment for the current class. 

— addMethodHdr (hdrinfo). This function is used to add the abstract method 
header specification to the environment for the current interface. 

— addStaticEield(name, valExpr). This function is used to add the specified 
static field and initialization expression to the environment given the current 
type and class scope. 
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— addStaticInit(com). This function is used to add the command code (or de- 
notation) for the specified static initializer to the class specification in the 
environment. 

— enter Class (mods, id, super, interfaces). This function is used to denote that 
we are currently parsing a class specification. It modifies the current scope 
of the environment and sets the SzcurrentClass field of the environment to 
indicate the current class. 

— enter Interface (mods, id, extends). This function is used to denote that we 
are currently parsing an interface specification. It modifies the current scope 
of the environment and sets the ^current Int field of the environment to 
indicate the current interface. 

— import(name) and importOnDemand(name) . These function are used to de- 
note the java import command. Specifically they are used to add all the 
class definitions from the specified files to the environment. 

— instanceOf(Ti,T2) - returns true if ri is an instance of T2 in the current 
environment. 

— isStatic(). This boolean function returns true if the modifier of the current 
field declaration include the static modifier. 

2 Store 

The store is memory that is dynamically created, expanded, and destroyed by 
the execution engine. We can view the store as a communication channel between 
statements. Together with the environment of Section ^ it forms the state of 
the execution environment. Every local variable declaration, loading of a class 
object or new operator applied to a class type or array type creates one or more 
entries in the store. If the entry is a class object, the content of the entry is 
filled according to the constructor code of the class and field initializations. An 
array object is initialized with a field name of “length” denoting the number of 
elements in the array. 

For the semantic presented in this chapter the only auxiliary function for the 
store is: 

— mkException(className). This function creates a new exception object in 
the store, as defined by the exception class referred to by className. 



3 Denotational Semantics 

This section presents the (almost) full denotational semantics of the Java lan- 
guage - only missing aspects of concurrency. 



3.1 Semantics Domains and Data Values 

One is often tempted, when developing a formal model of a language, to abstract 
out the limitations of the concrete representation of the language. For example. 
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authors of many language models will abstract values of type int to mathematical 
integers. Unfortunately, this provides an unrealistic definition of the behavior and 
meaning of the language constructs. For example, in the Java language there 
are no run-time indications of overflow or underflow of integers operations, but 
rather an implicit truncation of the resulting two’s complement representation 
of the number to the requisite number of bits. Without an understanding of 
this functionality of the language and an explicit representation of it in a formal 
description of the language, correctness proofs of the code may be incorrect. To 
avoid this difficulty we represent all concrete limitations of the Java language 
in the following semantics of expressions. This is possible since Java precisely 
defines these limitations for all primitive types. 

3.2 Semantic Domains 

The semantic domains representing the values of the numeric data types are de- 
fined below. To simplify the semantics, we have added two special values to each 
of these domains, T ( “bottom” ) which represented a value with no information 
content and T ( “top” ) which represents a value with full (potentially conflicting 
information) . The purpose of these values is to enable each domain to be a com- 
plete partial order, which simplifies the mathematics underlying the semantics. 
These values are used by the semantic functions and do not have an equivalent 
representation within the Java language. The basic domains are flat domains 
in that there is no implicit ordering between values of the domains other than 
between the values and T and T. 

Let I represent the set of integers, and TZ represent the set of real numbers. 
In the following lEEE(s.m.e) denotes an IEEE 754 floating point number with 
sign, mantissa and exponent, NAN represents not-a-number and -l-oo and — oo 
represent positive and negative infinity, respectively Q. 

Byte = {n G X I — 128 < n < 127} U 
{^,T} 

Short = {n gJ \ — 32768 <n< 32767} U 
{^,T} 

Int = {nGl \- 2147483648 <n< 2147483647} U 
{^,T} 

Long = {nGl\- 9223372036854775808 <n< 9223372036854775807} U 
{^,T} 

Char = {n G X |0 < n < 65535} U 
{^,T} 

Float = {/ G 7^ 1/ = lEEE(s.m.e), 0 < m < 2^^^ - 1 A -149 < e < 104} U 
{T, T, NAN, - 1 - 00 , — oo} 

Double = {/ G 7^ 1/ = lEEE(s.m.e), 0 < m < 2^^ - 1 A -149 < e < 104} U 
{T, T, NAN, - 1 - 00 , — oo} 
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We define several other semantic domains for use within the denotational seman- 
tics of Java presented in this Chapter. Note that we deliberately avoid specifying 
the detailed semantic domains of literals, but leave them abstract and presume 
that a parser will interpret them correctly. Note that in this presentation here, 
we do not present the specifics of the domains, but rather try and define them in 
a context specific manner. For example we typically have ct G if be a store, and 
7 G F be an environment. The values (such as r G V) denote the basic values of 
the java language (shorts, ints, floats, etc.) and their types, r G T. We also refer 
to locations I G C as indecies into the store. These are all flat domains, with a 
T and T value as discussed above. 

3.3 Continuations 

Many of the semantic functions defined in the following sections utilize the con- 
cept of continuations. While evaluating a syntactic construct, we typically focus 
on one piece of the code. A continuation defines the semantics of the rest of 
that code (whether it be the rest of an expression, the rest of a command, all of 
a method, or the rest of a declaration). The results of continuations are either 
values of the specified semantic domains, such as environment or an answer. 
Since the core Java language does not interact with the outside world, we have 
left the concept of modifications to this world as an abstract answer domain. 
None of the semantics here modify that domain, such modification only occurs 
in the runtime libraries (Java API). We utilize the following continuations in 
these semantics: 

“ p{l) ~ package continuation. This continuation takes the environment pa- 
rameter, 7, and returns an environment based on the declarations of the 
rest of the code. Note that this continuation is used in the highest level of 
package/code declarations. 

— <5(7) - declaration continuation. This continuation takes the environment pa- 
rameter, 7, and returns an environment based on the declarations of the rest 
of the code. This continuation is used within specific declaration constructs. 

— 0(7, a) - command continuation. This continuation takes the environment 
parameter, 7 and store parameter, a, and returns a an answer based on the 
denotation of the rest of the command. The denotation is dependent on the 
parameters specified, which are typically a modified store and a potentially 
modified environment from a command execution. 

— «:(r, r, a) - expression continuation. This continuation takes a value, r, of type 
T and a store parameter, a, and returns a store based on the denotation of 
the rest of the program. The denotation is dependent on the typed data 
value specified, which is typically the result of an expression evaluation. 

— a{v, r, I, a) - location continuation. This continuation takes a value, v, of 
type, r, location, I, and a store parameter, a, and returns an answer based 
on the denotation of the rest of the program. The denotation is dependent 
on the typed data value specified and location, which are typically the result 
of an expression evaluation and the location is the location of the variable 
referenced in the expression. 
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3.4 Semantic Functions 

Within the denotational semantics, we make use of several semantic functions. 
These functions define the relationship between the code, as seen by the parser, 
and the actual operations of the resulting program. Since we are working with a 
full language specification (excluding multi-threading), we need to use a wider 
range of semantic functions than those found in simpler examples in the liter- 
ature. The semantic functions used are divided into two categories, operational 
and definitional. 

Operational Semantic Functions. In the context of the Java programming 
language an operational semantic function is one that defines the relationship 
between the current language construct and the execution time behavior of the 
program. Specifically, operational semantic functions directly manipulate the 
store, resulting in a new store. Specifically, in the following semantics, the oper- 
ational semantic functions are: 

— Command functions C|]. These functions define the meaning of Java com- 
mands. The meaning of any Java command is defined in terms of three 
parameters, the current environment 7, a command continuation 9, and the 
current store a. The command continuation is a function that defines the 
behavior of the rest of the program in the context of an environment and a 
store. Therefore, the result of the C|c] function is typically 0(71, (Ti), where 
7i and cti are the new environment and store obtained from executing the 
command c, and 0(71 , a{) represents the behavior of the program given these 
values. This is not true when the command c results in an exception, or ab- 
normal flow of control change (such as a break or return command.) In 
these cases, the result of the C|c] function is based on a related continua- 
tion stored within the environment (e.g., see the semantics of the break and 
return commands.) 

— Expression functions £ |] and location functions C |] . These functions define 
the meaning of Java expressions. We separate the location functions to de- 
note those expressions that result in a value (called value expressions in the 
JLS H) and those that result in reference to some memory location (called 
variable expressions in the JLSH)- Note that modification of the store must 
result in the assignment of a value to a location (either directly through an 
assignment statement or through a pre or post expressions - e.g., i-l— 1-.) 

• The £ |] functions are defined in terms of three parameters, the current 
environment 7, an expression continuation k, and the current store a. 
The expression continuation is a function that defines the behavior of 
the rest of the program in the context of a value, type and a store. 
Therefore, the result of the £\e\ function is typically k(u, r, cti), where v 
is the resulting value of type r obtained from executing the expression 
e, and cti is the resulting store. As with commands, exceptions do occur 
that may result in using a saved expression continuation instead of the 
current continuation. 
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• The £|] functions are also defined in terms of three parameters, the 
current environment 7, a location continuation a, and the current store 
a. The location continuation is a function that defines the behavior of the 
rest of the program in the context of a value, type, location and a store. 
Therefore, the result of the £|e] function is typically a(u, r, I, cti), where 
V is the resulting value of type r obtained from executing the expression 
e, which refers to a variable at location /, and cti is the resulting store. 



Definitional Semantic Functions. A definitional semantic function is an- 
cillary to the actual execution behavior of the program, but rather defines the 
context in which the execution takes place. The definitional functions used in 
the following semantics are: 

— Goal 0 |] and Package functions 7 ^|]. These functions define the high-level 
meaning of a Java source file, defined in terms of import files and class defi- 
nitions. The goal function takes no parameters and returns an environment. 
The environment is subsequently used during execution and provides the 
full class definitions for the command and expression functions. The package 
function takes two parameters, the current environment 7 and a package 
continuation function p. We use a continuation function here to be consis- 
tent with the style of semantics presents in the command and expression 
functions. The continuation function takes the newly modified environment 
and returns an environment based on the rest of the file. 

— Declaration functions U|]. These functions define the declarations of meth- 
ods, classes, interfaces and other lower-level constructs within the package. 

— Modifier functions AJ |] . This function defines the list of modifiers for fields 
and methods. 

— Type functions T|]. These functions are used solely to determine specified 
data types. These types are calculated based on the current environment, 
provided as a parameter. The result of the type function is a string rep- 
resentation of the data type. Specifically we use the same string notation 
that Java bytecode uses to specify types Q. Note that there are some cases 
where multiple types must be returned (for example the list of interfaces im- 
plemented by a class), in this case we just append the string representation 
of the types as the Java bytecode does for parameter lists. 

— Value functions V|]. These functions are a catch-all function that returns 
a value associated with the static input. A value function only takes the 
current environment as a parameter and returns a pair that consists of the 
value (as a basic type or string) and the type. Throughout these semantics 
we may need only the first or second element of this pair. We will select 
these using the fst and snd operations on the result or by direct assignment 
(r, r) =V|F]7, where r is assigned the first value of the pair and r is assigned 
the second. 
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3.5 Auxiliary Functions 

— mkArrayType(r, n) - this function takes the type specified by the first 
parameter and returns an array type of n dimensions. 

— mkMethodValue(c?, r) - this function takes the definition specification of 
a method, d which is a pair consisting of a method name and the formal 
parameters, and a return type, r, to create a value to store in the environment 
for searching and retrieving the methods. 

— binaryPromoteType(ri, T2) - - returns the type resulting from a binary 
number promotion of the types ri and T2 according to the rules of the JLS. 

— unaryPromoteType(r) - returns the type resulting from a unary numeric 
promotion of type r according to the rules of the JLS. 

— promote(r, (r, ri)) - this function converts a value r, of type ri to a com- 
patible value of type r following the numeric promotion rules of the JLS. 

— cast(r, (r, ri)) - this function converts a value r, of type ri to a compatible 
value of type r following the type casting rules of the JLS. 

— leftShift((ri, n), (r2,T2)) - this function returns the value of ri, of type ri 
left shifted V2 places (where V2 is of type T2). The resulting value is of type 
n- 

— String(r, r) - this function actually invokes the toString function of the 
java.lang class corresponding to the type r on te value r to return a string. 

— fst(p) - this is a function that takes returns the first value of a pair. 

— snd(p) - this is a function that takes returns the second value of a pair. 

— append(U, U) - this is a simple list append function. 

— isNumeric(r) - this function returns true is the type of the parameter can 
be classified as a numeric value. 

4 Java Semantics 

The following sections detail the semantics of the Java language. To simplify the 
presentation, we have taken the full syntax of the Java language (as presented 
in Chapter 2 of this volume) and reduced it, by removing high-level redundancy. 
For example, syntactically we define several levels of statements, including those 
with and without trailing if’s. For the presentation here, we only worry about 
the actual statements, such as the for-statement, while-statement, etc. 

4.1 The Meaning of a Java Program 

When a user wants to execute a Java program myclass they type “ j avac myclass 
args’' . In the context of the semantics presented here, this is defined as: 

{'j.getMethod{“myclass.main" , “[Ljava.lang. String;” ))V|args]7)7CexitO' where 
7 = tJ|Goal] / / where Goal is the myclass .java file and 
a = j.classLoader{myclass, newStore{)) and 

Cexit = the command continuation function that terminates the program 
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This semantics evaluates the source file (and all other imported class files) 
to create a new environment 7. It then invokes a classLoader function to load 
the specified class into the store, a and executes the method main of the class 
with respect to the specified command-line arguments, the environment and the 
new store. 



4.2 Names and Literals 

The designers of the Java language separated the concept of names from other 
primary entities in the grammar. The reason for this is to avoid some possi- 
ble ambiguities in a LALR(l) parser. For the semantic functions, all we need 
to return is a representation of the name to be used once the full name is de- 
fined. Since the value semantic function requires a returned (value,type) pair, 
we specify the type of names as “name” . 

V[<Name>|7:~ 

V[<SimpleName>]7 
I V[<QualifiedName>]7 

V[<SimpleName>]7 ::= 

Vpd]7 = (ValueOf(Id), “name”) 

V[<QualifiedN ame > ] 7 : : = 

V|[Name.Id]7 = 

(Str -I- -I- ValueOf(Id),“name”) where 

Str = fst(V[Name]7 ) 

V[<Literal>]7 ::= 

V[IntLit|7 = (ValueOf(IntLit), “I”) 

I viFloatLit|7=(ValueOf(FloatLit), “F”) 

I V[BoolLit]7= (ValueOf(BoolLit), “Z”) 

I vicharLit]7= (ValueOf(CharLit),“C”) 

I V[StringLit|7 = (ValueOf(StringLit), “Ljava.lang. String;”) 

I viNullLit]7= (null, “L;”) 



4.3 Packages 

Goal and Compilation Unit. The following productions define the semantics of 
a single Java compilation unit. This is encapsulated within a goal, which has no 
parameters. The goal semantics specify the creation of a new environment and 
an identity continuation such that the result of the goal will be an environment 
to be used during execution. In the <CompUnit> specification we forced the 
automatic loading of the java. lang package as if there were the statement “import 
java.lang.*” immediately following any package declaration statement. 

g[<Goal>] ::= 

C/|[<CompUnit>] = "P|[<CompUnit>|7p where 
7 = newEnvironment() and 
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V7i.p(7i) = 7i 

P[<CompUnit>]7p = 

'P[<PackageDecl>’ <ImportDeclList>’ <TypeDeclList>^]7p = 
"P|[<PackageDecl>’]7pi where 

V7i.pi(7i) = 'P[<ImportDeclList>’]72p2 where 
y2 = p{-y .importOnDemand{)a,va,.la,ng)) and 
V72.p2(72) = P[<TypeDeclList>'^|72pl‘ 

'P[<PackageDecl>]7(5 ::= 

"PJpackage <Name> ;|7p = pi'jl&ipackage ^(fst(V[<Name>]7))]) 

Import commands. The import commands cause some difficulty in the semantics. 
Specifically, an import command loads into the environment all the relevant 
information related to an entity specified in a separate compilation unit. For the 
sake of brevity we include auxiliary functions that modify the environment to 
reflect the action of the import command. 

'P[<ImportDeclList>]7(5 ::= 

'P[<ImportDecl>|7p 

I "P j<ImportDeclListi> <ImportDecl>|7p = 

"P|[<ImportDeclListi>]7pi where 
V71.pl (71) = P|[<ImportDecl>|7ip 

'P[<ImportDecl>|7p :: = 

'P[<SingleTypeImportDecl>]7p 
I P’|[<TypeImportOnDemandDecl>]7p 

"P|[<SingleTypeImportDecl>]7(5 ::= 

"Plimport <Name> ;]7p = p(7.import(fst(V[<Name>|7)) 

"P I < Typeimport OnDemandDecl > 1 7(5 : : = 

P[import <Name> . * ;|7P = p{'y.importOnDemand{fst{Vl<^a,me>J'y)) 

Class and Interface Declarations. The class and interface declaration produc- 
tions are defined in terms of the declaration semantic function. The following 
semantics define the relationship between the package and declaration semantic 
functions. 

P[<TypeDeclList>|7(5 ::= 

P[<TypeDecl>]7p 

I P[<TypeDeclList> <TypeDecl>]7p = 

P[<TypeDeclList>]7pi where 

V7i.pi(7i) = I>|[<TypeDecl>]7i(5 where 

V7i.(5(7i) = p(7i) 

T’[<TypeDecl>]7(5 ::= 

P’|[<ClassDecl>]7(5 
I P>|[<InterfaceDecl>]7(5 

I T>[;]7(5 = (5(7) 
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4.4 Types 

The T|] semantic function returns the type defined by the given syntactic con- 
struct. The returned type is a string representation of the specified type using 
the notation defined in the JVM Q. Specifically, the returned values are: 

Type Return Value 



boolean 


“Z” 


byte 


“B” 


short 


“S” 


char 


“C” 


int 


“I” 


long 


“J” 


fioat 


“F” 


double 


“D” 


void 


“V” 


array of Type 


“[IType” 


class or interface 


“LcZassname;” 


method* 


(tit2 ■ ■ ■tn)tr 



for methods of the form: 

return-type meth-name{parmi , parrri 2 , . ■ . ,parmn) 
where tr is the return type, ti is the type of parrui. 



T[<Type>]7 :: = 

T[<PrimitiveType>]7 
I T|[<ReferenceType>|7 

T|[<PrimitiveType>|7 :: = 
T[<NumericType>|7 
I T[boolean|7 = “Z” 

T|[<NumericType>|7 ::= 
T[<IntegralType>|7 
T[<FloatingPointType>|7 

T|<IntegralType>|7 :: = 
T|[byte|7 = “B” 

I T[intl 7 = “I” 

I l?'Ilongl 7 = “J” 

I T|[short|7 = “S” 

I T[char|7 = “C” 

T|<FloatingPointType>]7 :: = 
T[float|7 = “F” 

I T|[double|7 = “D” 

T|<ReferenceType>|7 :: = 

T[<ClassOrInterfaceType>]7 
I T[<ArrayType>|7 
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T[<ClassOrInterfaceType >|7 ::= 

T[<name >]7 = “L”+fst(V[<Name>|7) + “;” 

T[<ClassType >|7 ::= 

T[<Class0rInterfaceType>]7 

T[<InterfaceType >]7 ::= 

T[<ClassOrInterfaceType >]7 

T|[<ArrayType >]7 ::= 

T[<PrimitiveType> [ ]|7 =mkArrayType(ri ,1) where 
ri = T[<PrimitiveType >]7 
I T|<Name> [ ]|7 = mkArrayType(ri,l) where 
ri = fst(V[<Name>| 7 ) 

I T|<ArrayType> [ ]|7 = mkArrayType(ri ,1) where 
ri = T[<ArrayType >|7 where 

mkArrayType(r,n) = | ap,+^kArrayType( T,n — 1) when n > 0 



4.5 Modifiers 

Modifiers specify the access constraints of classes, methods and fields in Java 
programs. As such, we need to specify the list of modifiers for the declaration se- 
mantic functions that use them. The AJ|] semantic function returns all modifiers 
as a list of strings. 

At|[<Modifiers>] = 

A4 [<Modifier>| 

I M|[<Modifiersi> <Modifier>| = cons(At|[Modifiersi],A1|[Modifier|) 

Ad [<Modifier>] ::= 

M|[public| = “public” 

I Ad|[private| = “private” 

I Ad [protected| = “protected” 

I Ad [static] = “static” 

I Ad [abstract] = “abstract” 

I Ad [final] = “final” 

I Ad [native] = “native” 

I Ad [synchronized] = “synchronized” 

I Ad [transient] = “transient” 

I Ad [volatile] = “volatile” 



4.6 Interface Declarations 

Interfaces specify a group of classes. Within each interface is a set of nested 
class and interface declarations (starting in Java 1.1), constant fields and ab- 
stract methods. In addition, an interface can extend another interface. All of 
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this syntax represents a declaration abstraction for a group of classes. When a 
class is declared to implement an interface, all of the interface body declarations 
are included in the beginning of the class declaration. The auxiliary function 
enter Inter face modifies the current environment to include the new interface 
declaration and members. The current interface is defined in terms of the name 
of the declared interface for the remainder of the declaration, it then reverts to 
the calling interface name for the continuation. 

U[<InterfaceDecl>|7(5 ::= 

U|[< Modifiers >’ interface <Id> <Extends>’ <InterfaceBody>]7(5 = 
iD[<InterfaceBody>]7i(5i where 

7i = 7.enter/nter/oce(Af [<Modifiers>], 

VI<Id>l7, 

T[<Extends>]7) and 

V72.(5i(72) = S{'y2[&icurrentlnt <— 'y{&icurrentlnt)]) 

T[<Extends>|7 ::= 

T[extends <InterfaceType>|7 = T|<InterfaceType>|7 
I T|<Extends> , <InterfaceType>|7 = 

T[<Extends>]7 + T[<InterfaceType>]7 

U|[<InterfaceBody>]7(5 ::= 

I>|[{ <InterfaceMemberDeclList>’ }|7(5 = 

T>\< InterfaceMemberDeclList >] 7(5 

T>\< InterfaceMemberDeclList >] 7(5 : : = 

U [ < InterfaceMemberDecl >] 7(5 

I U I < InterfaceMemberDeclList i> <InterfaceMemberDecl>|7(5 = 

U [ < InterfaceMemberDecl >1 7(5i where 

V7i.(5i(7i) = lD|[<InterfaceMemberDecl>]7i(5 

U|[<InterfaceMemberDecl>|7(5 ::= 

T>[<ClassDecl>]7(5 
I I?|[<InterfaceDecl>]7(5 
I U|<AbsMethodDecl>|7(5 
I lD|[<ConstantDecl>]7(5 

lD|[<AbsMethodDecl>]7(5 ::= 

2?|[<MethodHdr> ;] 7(5 = (5(7i) where 
7i = 'y.addMethodHdriv) and 
V = V[<MethodHdr>] 7 

lD|[<ConstantDecl>]7(5 ::= 

T>|l<FieldDecl>|7(5 

4.7 Class Declarations 

The following grammar presents class declarations. As with interfaces, a class 
declaration enters a new class, thus modifying the environment. The environment 




Dynamic Denotational Semantics of Java 215 



includes the definitions of all members of the class and links to the super class 
and implemented interfaces. 

U|[<ClassDecl>]7(5 ::= 

U[< Modifiers class <Id> <Snper>^ <Interfaces>’ <ClassBody>]7(5 = 
r’[<ClassBody>]7i(5i where 

7 i = 7 .enterC/ass(Al[<Modifiers>], 

V[<Id>l7, 

T[<Snper>] 7 , 

T[<Interfaces>| 7 ) and 

V72.Ji(72) = 5{'y2[&icurrentClass <— 'y{&icurrentClass)]) 

T[<Snper >|7 ::= 

T[extends <ClassType >|7 = T[<ClassType >]7 
T[<Interfaces >|7 ::= 

T[implements <InterfaceTypeList >|7 = T|<InterfaceTypeList >|7 

T[<InterfaceTypeList >]7 ::= 

T[<InterfaceType >]7 

I T[<InterfaceTypeListi> , <InterfaceType >]7 = 

T[<InterfaceTypeListi >|7 + T[<InterfaceType >|7 

U|[<ClassBody>]7(5 ::= 

U|[<ClassBodyDeclList^>]7(5 

U[<ClassBodyDeclList>]7(5 ::= 
r’[<ClassBodyDecl>]7(5 

I 2J|[<ClassBodyDeclListi> <ClassBodyDecl>]7(5 

The Class Body consists of class members which include nested classes, nested 
interfaces, fields and methods; constructors and static initializers (which are 
class-level constructors invoked the first time a class is activated). It is impor- 
tant to note that when a class is activated (denoted in these semantics by the 
classLoader function): the parent class is loaded, then all static variables are ini- 
tialized and all static initializers are executed in the order they appear in the class 
declaration. The addStaticInit routine used below, and the addStaticField rou- 
tine used in the Field Declaration section enters the partial semantic functions 
for these initializers into the environment. The classLoader function recovers 
these partial functions and completes them with the current parameters. 

U|[<ClassBodyDecl>]7(5 ::= 

2?[<ClassMemberDecl>|7(5 
I I>[<StaticInit>]7(5 = 

S{'y.addStaticInit{Cl<Staticlmt>})) 

I r>|[<ConstrDecl>|7(5 

U|[<ClassMemberDecl>|7(5 ::= 

U|[<ClassDecl>|7(5 
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I U|[<InterfaceDecl>|7J 
I ©i<FieldDecl>l7<5 
I r>|[<MethodDecl>]7(5 

4.8 Method Declarations 

We have slightly modified the method declaration productions from the grammar 
presented in the JLS Q, including modifiers and throws directly in the method 
declaration instead of in the header. This was done to simplify the semantic 
functions. The major action of these productions is to add a method into the 
environment of the current class. Associated with the type signature and name of 
the method is a partial semantic function that defines the operational behavior 
of the method. When the method is invoked, values of the actual parameters are 
passed to the formal parameters of the method, and then the method body is 
executed using the new values. Any resultant value is returned as the result of the 
method semantic function. The mkMethodValue function takes the method 
name and formal parameter type list and returns what we term a method value. 
This method value specifies the name and type signature of the method along 
with the names of the formal parameters. The exact details of this notation is 
not important here, it is sufficient to know that this information is used by the 
addM ethod routine. 

7J|[<MethodDecl>]7(5 = 

H|[< Modifiers <MethodHdr> <Throws>^ <MethodBody>|7(5 = 

'y .addM ethod{M |[<Modifiers>|,V|[<MethodHdr>|7, 
T|<Throws>|7,C|<MethodBody>|) 



V[<MethodHdr>|7 ::= 

V[<Type> <MethodDef>|7 = mkMethodValue(V|[<MethodDef>|7,<Type>) 

I V[void <MethodDef>|7 = mkMethodValue(V|[<MethodDef>|7,void) 

V[<MethodDef>]7 = 

V[<Id> ( <FormalParmList>^ )|7 = 

(fst(V[<Id>|enii),V|<FormalParmList>|7) 

I V[<MethodDef> [ ]|7 = mkArrayType(V|[<MethodDef>]7,l) 

The formal parameter list is returns a pair that consists of a list names of 
each of the parameters and a list of corresponding types. The following semantic 
functions return this pair. The throws clause returns a type that corresponds to 
the types of each of the thrown classes. 

V[<FormalParmList>|7 ::= 

V[<FormalParam>|7 

I V[<FormalPamiList> , <FormalParam>|7 = 

V[<FormalParmList>|7 + V[<FormalParam>|7 

V[<FormalParam>|7 = 

T[<Modifier> <Type> <VarDeclId>|7 = (fst(V|[(|<VarDecId>),T|[<Type>|7) 
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T[<Throws>|7 ::= 

T[throws <ClassTypeList>]7 = T[<ClassTypeList>|7 

T[<ClassTypeList>]7 ::= 

T[<ClassType>|7 

I T[<ClassTypeList> , <ClassType>]7(5 = 

T[<ClassTypeList>]7 + T[<ClassType>]7 

C[<MethodBody>]70(T ::= 

CI;l76>cr = e(7,cr) 

I C[<Block>]7(5cr 

4.9 Field and Variable Declarations 

The following semantic functions define the meaning of the field and variable 
declarations. These declarations are used to modify the environment to define 
static and regular fields and local variables. 

D|l<FieldDecl>l7(5 ::= 

U[< Modifiers >’ <Type> <VarDecl> ;]7(5 = I?|<VarDecl>|7i(5 where 
7i = 7[&Mods ^ At [<Modifiers>|, 

&Type ^T|l<Type>]7, 

&ivarType <— “Field”] 

U[<VarDeclList>]7(5 ::= 

2J|[<VarDecl>|7(5 

I U|[<VarDeclList> , <VarDecl>|7(5 = U|[<VarDeclList>|7(5i where 
V7 i.(5i(7i) = r’|[<VarDecl>]7i(5 

T>I<VarDecl>]7<5 ::= 

U|[<VarDeclId>]7(5 = J(7i) where 
let {name, type) = V[<VarDeclId>]7 in 
71 = 

if {'y{&ivarType) == “Field”) 
if {') .is Static)))) 

'y .addStaticField(name,Aei&\x\t\mt{type)) 
else 

'y .addField{name,Aei&u\t\mt{type)) 
endif 

else / /{'y{k.varType) == “Local”) 

7 . addLocal{name , defanlt Init (type)) 
endif 

I I>|[<VarDeclId> = <VarInit>]7(5 = <i(7i) where 
let {name, type) = V[<VarDeclId>]7 in 
71 = 

if {'y{&ivarType) == “Field”) 
if {-y. is Static)))) 

y .addStaticField)name,£l<Va,rlmt>^) 
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else 

'y .addField{name,£\<Vax\mt>Y) 
endif 

else / /{‘^{k.varType) == “Local”) 

7. addLocaZ (name, f|[<VarInit>]) 
endif 

V[<VarDeclId>|7 ::= 

V[<Id>l7 = (Value0f(<Id>),7(&Ti/pe)) 

I U|[<VarDeclId> [ ]|7i5 = (Value0f(<Id>),mkArrayType(7(&Ti/pe),l)) 



4.10 Initializers 

Initializers consist of both static block initializers for classes and field and local 
variable initializers. All initializers are simply evaluated upon instantiation of 
the class, field or variable. For fields and variables the resultant value is a pair 
consisting of a list of values and a list of types corresponding to these values. 

C[<StaticInit>]70(T ::= 

ecstatic <Block>|70cr = C[<Block>|76cr 

f [<VarInits>|7Kcr ::= 
f [<VarImt>]7K(j 

I f [<VarInitsi> , <VarInit>|7Kcr = f [<VarInitsi>|7«:i(T where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = f [<VarInit>|7K2ai where 
Vr2, T2, (T2. K2(t 2,T2, (J2) = K,(q, T, (T2) where 
q — append(ri, C2) and 
r = n + T2 

f [<VarImt>]7Kcr:: = 

f |[<Expression>]7K(J 
f |[<ArrayInitializer>|7K(T 

f [<ArrayInit>]7K(j :: = 

£\{ <VarInits>' }]7Kcr = f[<VarInits>]7Ki(T where 
Vr, r, cr.Ki(r, r, cr) = K{r, ri, a) where 
Ti = mkArrayType(r,l) 



4.11 Constructor Declarations 

The constructor semantic functions define the meaning of object constructors. It 
is important to understand that when a constructor is invoked, it either calls an 
implicit constructor of the super class or an explicit constructor. This is denoted 
in the semantic functions below. The instantiateClass function of stores return 
a triple consisting of a value (the reference to the new object), a type (of the 
object), and a new store that contains the new locations for the fields of the 
object. 
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D|[<ConstrDecl>|7(5 ::= 

7 ? [< Modifiers <ConstrDef> <Throws>^ <ConstrBody>]7<5 = < 5 ( 7 i) where 
7i = 7.addC'onstr(AJ|[<Modifiers>|,V[<ConstrDef>]7, 
T|[<Throws>|7,£|[<ConstrBody>|) 

V[<ConstrDef>|7 = 

V[<SimpleName> ( <FormalParmList>^ )|7 = 
(fst(V[<SimpleName>|7),V|<FormalParmList>|7) 

£[<ConstrBody>|7K(j = 

C|[{ <ExplConstrInv> <BlockStmtList>^ }l7«:r'' = 

£[<ExplConstrInv>|7Ki(T where 
Vr, r, (Ti.Ki(r, r, (Ti) = d{a\) where 

V(T2. 0(0-2) = C|<BlockStmtList>|7,0i, 0-3 where 
let (ri,ri,o-3) = a2.instantiateClass{r) in 
Vo-4.01 (0-4) = «(ri, n, 0-4) 

I 8 \{ <BlockStmtList>^ }l7'f^ = 
f|[super ()]7Kio- where 

Vr, r, o-i.Ki(r, r, 0-1) = 0 (o-i) where 

'i(j 2 . 0 ((J 2 ) = C|<BlockStmtList>|7,0i, 0-3 where 
let (ri,ri,o-3) = G2-instantiateClass{r) in 
Vo-4.01 (0-4) = K(ri, n, 0-4) 



f|[<ExplConstrInv>]7fi;o- :: = 

71 |[this ( <ArgList>^ ) ;|7«o- 
I f|[super ( <ArgList>^ ) ;]7fi:o- 

4.12 Blocks and Statements 

In this section, we present the semantics for blocks and statements in the Java 
language. We differ from the grammar in the JLS by not presenting any of 
the statements associated with the No Short If constructs, used in the JLS to 
avoid syntactic ambiguity with dangling else clauses. The semantics for all of 
these clauses can be easily derived from the semantics presented here. 

Blocks. A block consists of an optional sequence of block statements within a 
pair of braces. Note here, that if there are no block statements, the semantics of 
the block are equivalent to 0(7, a). 

C|[<Block>|70(T ::= 

C|{ <BlockStmtList>^ }l 70 c = C|BlockStmtList|70cr 

C|[<BlockStmtList>|70(T :: = 

C|[<BlockStmt>|70(T 

I C|[<BlockStmtListi> <BlockStmt>|70cr = C[<BlockStmtListi>|70i(j where 
V71, (Ti.0i(7i, (Ti) = C|<BlockStmt>]7i0(Ji 

C|<BlockStmt>]70(T = 

C|[<LocalVarDeclStmt>|70(T 
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I C|[<Stmt>|70(T 

C[<LocalVarDeclStmt>|70(j ::= 

C|<LocalVarDecl> ;|70cr 

Local Variable Declarations. These modify both the environment and the store 
(local store), by creating a new semantic entity. As such, a local variable decla- 
ration will continue program execution with these new attributes. 

C|[<LocalVarDecl>|70cr = 

C|[<Type> <VarDeclList>|70(T = 0|[<VarDeclList>|7(5(T where 
Vd, 7i,(Ti.J(d, 7i,(Ti) = 6»(72,cri) where 

72 = 7i[&Ti/pe <— T[<Type>|7,&narTj/pe <— “Local”] 

Empty, Labeled and Expression Statements. These statements are basic primi- 
tive statements of the Java language. The empty statement consists solely of a 
single semicolon and semantically continues operation as if nothing happened. 
The labeled statement modifies the environment to contain an identifier, Ld that 
refers to the current statement. The environment maintains the semantic evalu- 
ation of the statement as a function of the current statement, parameterized by 
possibly new environment and store. Note that the environment of the statement 
contains a reference to the label, Ld, but upon completion of execution, that la- 
bel is removed from the environment. The expression statement evaluates the 
expression using the semantic function for expressions, discarding any returned 
value or type, and continues execution using the possibly modified store. Note 
that we have simplified an expression statement to consist of any expression, al- 
though this is not strictly true. In the JLS []], the grammar restricts expression 
statements to a list of possible expressions. We take liberty with our assumption 
of syntactically correct programs to simplify the grammar here. 

C|[<EmptyStmt>|70(T :: = 

CMlOa = 61(7, cr) 

C|<LabeledStmt>|70(T ::= 

C|<Id> : <Stmt>]70(T = C|[<Stmt>|7i0i(j where 
71 = 7[/d <— 62] where 

V72, 0-2.02(72, 0-2) = C|[<Stmt>|720io-2 
V71, o-i.0i(7i, CTi) = 0(7, 0-1 ) 

C|<ExprStmt> ;]70o- = 

C|<Expr>]70o- = f |<Expr>|7fi;o- where 
Vr, r, o-i.Ac(r, r, oi) = 0(7, ai) 

If Statements. The if statement has two forms, one with and one without an 
else clause. The if statement executes the expression first, possibly modifying 
the store, and then behaves as the first statement if the expression is true. If 
the expression is false it either continues execution or behaves as the statement 
following the else clause. 
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C[<IfStmt>] 70 (T 

C[if ( <Expr> ) <Stmt>| 70 cr = f [Expr] 7 Kcr where 
Vr, r, r, ai) = 

if (r == true) 

C[<Stmt>] 70 (Ti 

else 

61(7, ai) 
endif 

C[<IfElseStmt>] 70 (T = 

C[if ( <Expr> ) <Stmti> else Stmt 2 | 70 cr = f|[<Expr>| 7 K(T where 
Vr, r, r, ai) = 

if (r == true) 

C[<Stmti>] 70 (Ti 

else 

C[<Stmti>| 76 (Ti 

endif 

The Switch Statement. The Java switch statement presents a few problems for 
the design of a denotational semantics. The problems we found and their reso- 
lution are discussed below. The approach we took involves modification of the 
environment to provide additional information to subsequent semantic functions. 
This modification is in the form of auxiliary variables. Note that these variables 
must be restored to their previous values upon completion of the switch state- 
ment to permit the correct evaluation of nested switch statements. 

— The data value obtained upon execution of the switch statement determines 
which case to execute. Thus this value must be carried along through the 
semantic functions until it is utilized. We decided to maintain the value in 
the environment under the auxiliary variable name EzswitchExpr. 

— Once a case label has been found to match the switch expression, all subse- 
quent switch block statements are to be executed. Thus, the semantic mean- 
ing of these statements is dependent on whether or not a case label matched 
the switch expression. We decided to maintain a boolean flag, SzcaseFound, 
in the environment to indicate whether or not a match has been found. 

— The default switch case label may occur any place a case label may occur. 
If no case label matches the switch expression, the meaning of the switch 
statement is the meaning of all switch block statements that follow the de- 
fault label. The problem is that not only do we have to inform the semantic 
evaluation functions that a default label has been found, but the functions 
also have to allow for the existence of a matching case label occurring af- 
ter the default label. The first problem is resolved with the boolean flag 
&zdefaultFound, which operates the same as the hcaseF ound flag. The 
other problem is resolved using the SzcaseC ont variable which records the 
environment, store and continuation parameters for the switch statement. 

— Unlabelled break statement may occur within a switch statement. The intent 
of this statement is to terminate execution of the switch statement. As such, 
the break information is stored in the environment. 
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C[<SwitchStmt>]70(T ::= 

C [switch ( <Expr> ) <SwitchBlock>|70cr = 
f[<Expr>]7K(j where 

Vr,T, (71. K(r, r, (7i) = C[<SwitchBlock>]7i0i(Ji where 
7i = 'y[&iSwitchExpr <— r, kcaseFound <— false, 

&idefauUFound<— false, &ccaseCont ^ (7,0i,(ji), 
Sibreak ^ 6i)] and 

V71, (72. 01 (71, (72) = 0(7, (72) 

C|<SwitchBlock>|70(7 ::= 

C[{ <SwitchBlockStmtList>’ <SwitchLabelList>’}|70(7 = 
C[<SwitchBlockStmtList>^]70i(7 where 

V7i,( 72. 01(71,(72) =C[<SwitchLabelList>|7i0(72 

C|<SwitchBlockStmtList>]70(7 ::= 

C[<SwitchBlockStmt>|70(7 

I C[<SwitchBlockStmtListi> <SwitchBlockStmt>|70(7 = 

C I < Swit chBlockSt mt List > ’| 70i (7 where 

V7i,(7i. 0i(7i,stoi) = C[<SwitchBlockStmt>’]7i0(7i 

C|<SwitchBlockStmt>]70(7 = 

C|<SwitchLabelList> <BlockStmtList>|70(7 = 
if {-y .getValue{&icaseFound) —— true) 

C[BlockStmtList]70(7 

else 

C[<SwitchLabelList>]70i(7 where 
V71, (71. 01 (71 (71 ) = 

if {'yi{&icaseFound) == true) 
C|BlockStmtList]7i(&caseCont) 
else if y\{<kdefaultFound) 

C[BlockStmtList]7i0(7i 

else 

0(71,0-1) 

endif 

endif 

C[<SwitchLabelList>]70(7 ::= 

C[<SwitchLabel>]70(7 

I C|<SwitchLabelListi> <SwitchLabel>|70(7 = 
C|<SwitchLabelListi >1701(7 where 

V7i,(7i. 01 (71(71) = C[<SwitchLabel>]7i0(7i 



C|<SwitchLabel>|70(7 ::= 

C|case <ConstExpr>]70(7 = f|<ConstExpr>]7K(7 where 

Vr, T, (71. K{r, T, (7l) = 

if (fst(r) == y[&iswitctiExpr]) 

9 {'y[&icaseFound <— true], ai) 
else 
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61 ( 7 , cTi) 

endif 

I C|default :|70cr = 0 {'y[&cdef aultFound <— true], a) 

Looping Statements. The Java language looping constructs, the while, do and 
for statements, are similar to the looping constructs of other languages. And 
as such, they cause difficulty for the writing of denotational semantics. Two 
different approaches to specifying the semantics of loops have been presented in 
the literature, the fixpoint approach Q and the recursive definition approach Q. 
We have defined the do-statement and the for-statement in terms of the while 
statement. Note that the expression in the for statement is optional. In the case 
where it is not present, the semantics need to assume that the result is always 
true; we have divided this case into two separate syntactic forms for clarity. 

C|[<WhileStmt>|70(T = 

C|while ( <Expr> ) <Statement>|70(j = 6 \{'y[&i,break <— 0 ],cr) where 
rec, V7i,(Ji. (71,0-1) = f|[<Expr>|7i«:o- where 
Vr, r, 0-1. K(r, r, oi) = 
if (r == true) 

C[<Statement>|7i0io-i 

else 

61 ( 7 , CTi) 

endif 

C|[<DoStmt>|70o- ::= 

C|[do <Statement> while ( <Expr> )J'yOa = 

C|[<Statement> ; while ( <Expr> ) <Statement>|70o- 

C[<ForStmt>]70o- = 

C|[for ( <ForInit>’ ; ; <ForUpdate>^ ) <Statement>|70o- = 

C|[<ForInit> ; while ( true ) <Statement> ; <ForUpdate>|70o- 
I C|[for ( <ForInit>^ ; <Expr> ; <ForUpdate>^ ) <Statement>|70o- = 
C|[<ForInit> ; while ( <Expr> ) <Statement> ; <ForUpdate>|70o- 

C[<ForInit>|70o- = 

C I < StmtExprList > 1 700- 
I C|<LocalVarDecl>|70o- 

C|[<ForUpdate>|70o- = 

C|[<StmtExprList>|70o- 

C[<StmtExprList>|70o- ::= 

C|<ExprStmt>]70o- 

I C|[<StmtExprListi> , <ExprStmt>|70o-= C|[<StmtExprListi>|70io- where 
V71, o-i.0i(7i, ai) = C|[<ExprStmt>|7i0o-i 

Misc. The following semantic functions define the behavior of the miscellaneous 
syntactic commands in the Java language. The expression statement list involves 
evaluation of list of expression, with the result values discarded. The break. 
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continue, return, and throw statements all evaluate their parameters and then 
look up the corresponding continuation in the environment. The meaning of the 
rest of the program is based on this continuation. The synchronized command 
is ignored in these semantics since we do not specify concurrency. 

C[<ExprStmtList>|70(T ::= 

C[<ExprStmtListi> , <ExprStmt>]70cr = C[<ExprStmtListi >]70icr where 
V7i,(Ti. 0i(7i,(Ti) = f [<ExprStmt>]7iK(Ji where 
Vr, r, (T2. K(r, r, (T2) = 0 i(ji,cr 2 ) 



C[<BreakStmt>|76(T ::= 

C [break \\'yOa = 0i(7cr) where 
6 \ = ').getComCont{!kbreak) 

I C [break <Id> ;]70cr = where 

6 \ = ').getComCont{hbreak) (fst(V[Id|7)) 

C[<ContStmt>]70(T ::= 

C[continue;|70(T = 9 \{'yG) where 
9 \ = 'y .getComCont{&icontinue)) 

I C [continue <Id> \\'y 9 a = 9 -i{'ya) where 
9 \ — 'Y .getComCont{&icontinue) fst(V[Id]7) 

C[<RetStmt>|70(T 

C [return \\'y 9 a = 'y .getC omC ont{&ireturn) 

I C[return <Expr> \\'y 9 a = f [<Expr>]7Kcr where 
Vr, r, a.n{r, r, a) — 0 i(cri) where 

9 i ~ 'y .getC omC ont{&ireturn) and 
ri = 7[&retMrnTi/pe] and 
ri = promote(ri, (r, r) and 
(Ti = a[&^returnV al <— ri] 

C[<ThrowStmt>]70(T ::= 

C [throw <Expr> = £^[<Expr>|7Kcr where 
/oraZ/r, r, 71, (T2.K(r, r, 71, ]stoi) (72,0-1) where 
72 = 71 l&ithrown <— (r, r)] and 
01 = 'y.[&ithrow] 

C[<SynchStmt>]70o- ::= 

C [synchronized ( <Expr> ) <Block>|70o- = f [<Expr>]7«:o- where 
Vr, r, CTi. K(r, r, oi) = C[<Block>|70o-i 

The try-catch statements of the Java language are an important aspect of 
the language for error control. The following semantic functions capture the 
meaning of these statements. A try block is executed until a throw command is 
executes, at that point the execution continues based on the continuation stored 
in the environment. This continuation consists o the execution of the finally 
block followed by evaluation of the catch parameter. If the thrown exception 
matches the formal parameter, the catch clause is executed and the program 
continues using the continuation from the commands following the try block. If 
none of the catch clauses match, then the throw propagates on up. 
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C[<TryStmt>|70(T ::= 

C[try <Block> <Catches>|70cr = C[<Block>]7i0icr where 
7i = 'y[&ithrow <— 62] and 
V72, ct2.6»i( 72, 0-2) = 6(7, 0-2) and 
V72, 0-2.02(72, 0-2) = C[<Catches>]703O-2 where 
V73, 0-3.03(73, 0-3) = 

if ('yz(k.thrown) == {null, “V”)) then 
^*(73, 0-3) 
else 

(73. [&throw] ((73,0-3) 
endif 

I C[try <Block> <Catches>^ <Finally>|70o- = C[<Block>]7i0io- where 
71 = 'y[&ithrow ^ 62] and 
V72, 0-2.01(72, 0-2) = C[<finally>]70o-2 and 
V72, 0-2.02(72, 0-2) = C[<finally>]703O-2 where 

V73, 5^03.03(73, 0-3) = C[<Catches>|7304o-3 where 
V74, 0-4.04(74, 0-4) = 

if (74 (&t/irow;n) == {null, “V”)) then 
0(74,0-4) 
else 

( 74 .[&throw])( 74 , 0-4) 
endif 

C[<Catches>]70o- 

C[<CatchClanse>|70o- 

I C[<Catchesi> <CatchClanse>|70o- = C[<Catchesi>]70io- where 
/ora/Z7i, 0-1.01(71, 0-1) = C[<CatchClanse>]7i0o-i 

C[<CatchClanse>]70o- ::= 

C [catch ( <FormalParam> ) <Block>|70o- = 
let (r, r) = 'y{&ithrown) and 

(e,Ti = V[<FormalParm>|) in 
if (t == ri) then 

C[<Block>|7i0o-i where 

71 = 7[&t/iroii;n ^ {null, “V”)] and 
0-1 = o-[7(e) ^ r] 

else 
0(7, cr) 
endif 

C|<Finally>|70o- = 

C [finally <Block>]70o- = C[<Block>]70o- 



4.13 Expressions 

Expressions in Java return either values or variables. In these semantics we have 
broken these into two categories, handled by different semantic function s, E [] for 
values and £[] for variables. The first two syntactic expressions denote constant 
expressions (which must return a value) and general expressions (which also 
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return values). Note that restriction that constant expressions return constant 
values is a compile-time check and thus is not represented in these semantics. 

£[<ConstantExpr>| 7 K(T :: = 

£^[<Expr>|7Ko- 

£’|[<Expr>| 7 K(j ::= 

f |[<AssignExpr>|7fi:(J 

Ass^ 5 nmen^ Expressions: There are several assignment operators in Java be- 
sides the simple assignment. According to the JLS Q the compound assignment 
Eiop = E 2 is equivalent to Ei = (T){{Ei)op{E 2 )) where T is the type of Ei and 
the expression Ei is evaluated only once. In the semantic model, we evaluate 
El once to obtain its memory location for assignment in the store, and use that 
location to determine the value of the expression for the operation. This evalua- 
tion requires the expression to return a variable (actually a variable location for 
use by the store) as opposed to a value. To indicate this return type we use the 
location £|] semantic functions. 

£[<AssignExpr>| 7 Kcr :: = 

£\< CondExpr >1 7 K(j 
I f|[<Assign>| 7 K(j 

f|[<Assign>| 7 fi:(j :: = 

£[<LHS> <AssignOp> <AssignExpr>| 7 Kcr = 
if (AssignOp == ‘-|- =’) 

£|[<LHS>| 7 Q(j where 

Vri, ri, Z, (Ti. a(ri, ri, Z, (Ji) = f|[<AssignExpr>| 7 Ai 2 cri where 
VC 2 , T 2 , (72. K 2 (r 2 , 72, (T 2 ) = 
let 7 = binaryPromoteType(7i , 72 ) and 

d = cast(7i ,(promote(7, (ri,7i)) -1-^ promote(7, ( 72 , 72 )), 7 )) in 
K,{d, 71, (T2[Z ^ d]) 

similar for — =,*=,%=,&=, ^ =, | = 

where the meaning of op-r is defined in the section on numeric expressions 

else if (AssignOp == ‘ / =’) 

£|[<LHS>| 7 Q(t where 

Vri, 71, Z, (71. Q(ri, 71, Z, (7i) = £|[<AssignExpr>|7K2(7i where 

Vr2, 72, (72. K2(72, 72, ( 72 ) = 

let 7 = binaryPromoteType(7i , 72 ) in 

if (72 = 0 A (7 = “I" V 7 = “L")) 

0 ( 71 ,( 73 ) where 

6 — 'y ,[&ithrow] and 

(73,73,73 = <j.mkException{ArithmeticException) and 
71 = 'y[&ithrown <— (73, 73)] 

else 

let d — cast(7i ,(promote(7, (71,71)) promote(7, (72, 72)), 7) in 
(«((f, 71,(72 [Z ^ d]) 



Dynamic Denotational Semantics of Java 227 



endif 

else if (AssignOp == ‘<<=’) 

£|[<LHS>]7Q(t where 

Vri, ri, (Ti. a(r2, Ti, Z, (Ti) = f [<AssignExpr>|7K2cri where 
Vr2, T2, (T2. K2(t2, T2, (T2) = 
let = unaryPromoteType(ri ) and 
T2 — unaryPromoteType(r2) and 
r[ — promote(r{, (ri, ri)) and 
r'2 — promote(r2, (r2, T2)) and 
d = (leftShift((ri, r{), (rj, Tj)) in 
«(d, ri, (T2[Z ^ d]) 

similar for >>=, >>>= 

where the meaning of opr is defined in the section on nnmeric expressions 

else if (AssignOp == ‘=’) 

£|[<LHS>]7Q(t where 

Vri, ri, Z, ui. f [<AssignExpr>]7«:2cri where 

Vr2,r2,(T2. K2(r2,T2,(J2) = 
if (n == null) 

0(71,0-3) where 

6 = 'y .[Sithrow] and 

0-3, rsjTs = G.mkException{NullPointerException) and 
71 = '^[k.thrown ^ {rs, T3)] 
else if (ri == OutOfBounds) 

0(71,0-3) where 

0 = 'y .[&ithrow] and 

0-3, r3,r3 = G.mkException{IndexOutO f BoundsException) and 
71 = ')[k.thrown <— (r3, T3)] 
else if not (t 2 <t ri) then 
0(71,0-3) where 

0 = 'y .[Sithrow] and 

0-3, r3,r3 = G.mkException{ArrayStoreException) and 
71 = 'y\k.thrown <— (r3, T3)] 

else 

K(r2, Ti,G2[l ^ promote(n, (r2, 7-2))]) 
endif 

endif 

<AssignOp> ::= 



/ = 

% = 
+ = 

<<= 

>>= 

& = 
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A 



r|[<LHS>l7acr ::= 

£|[<Name>] 7 a(T 
I £[<FieldAccess>|7acr 
I £[<ArrayAccess>]7acr 

Conditional Expressions. The conditional expressions (operator: ?) of the Java 
language are the only expressions that do not guarantee that all subexpressions 
are evaluated. The regular conditional expression, is a choice operation that 
executes the second or third subexpression based on the boolean result of the 
first subexpression. The return type of the expression is based on the type of the 
two possible resultant subexpressions. The conditional-or expression (operator 
II) and the conditional-and expression (operator &&) are short-circuit boolean 
expressions that only evaluate the second subexpression if the result of the first 
subexpression does not determine the result of the expression (i.e., short circuits 
on true for or and false for and). 

f [<CondExpr>]7Kcr::= 
f [<CondOrExpr>] 7 K(J 

| f [<CondOrExpr> ? <Expr> : <CondExpri >|7Kcr = 
f [<CondOrExpr>]7«:i(T where 
Vri, Ti, (Ti. Ki(r, r, (Ti) = 
if (n == true) 
f [<Expr>| 7 «: 2 cri 
else 

f|[<CondOrExpri >|7K2cri 
endif 

Vr2,r2,CT2. K2{r2,T2,a2) = 

K(promote(r, (c2, T2)), r, (72) where 

r = env .condTypeO f {Tt,Vt,Tf ,v f) and 

Vt = compile-time value of <Expr> and 

Tt = type of <Expr> and 

Tf = type of <CondOrExpri > and 

Vf = compile-time value of <CondOrExpri > 

we compute the result of ri <t r2 using the following: 

'y.condTypeOf{Tt,Vt,Tf,Vf) = 
if {n == Tf) 

Tt 

else if (isNumeric(rt) and isNumeric(r/)) 
if ((rt == “B”and r/ == “S”) or 
{{Tf == “B”and n == “S”)) 

“S” 

else if (rt € |“B”, “S”,“C”] and Vf G rt) 

Tt 

else if (r/ G |“B”, “S”, “C”] and Vt G Tf) 
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Tf 

else 

binaryPromotionType(rt , t/) 

end if 

else if {-y.assnCompatible{Tt,Tf)) 

Tf 

else 

Tt 

endif 

f [<CondOrExpr>]7K(j::= 
f |[<CondAndExpr>| 7 K(J 
| f |<CondOrExpri > || <CondAndExpr>|7Kcr = 
f [<CondOrExpri >]7Ki(T where 
Vr, r, (Ti. Ki(r, r, ai) = 
if (r == true) 

K{r, T, ai) 
else 

f[<CondAndExpr>]7K(Ti 

endif 

£\< Cond AndExpr >] 7KCT : : = 
f [<IncOrExpr>|7K(T 

I f |<CondAndExpri > <IncOrExpr>]7Kcr = 
f [<CondAndExpri >|7«:i(T where 
Vr, r, (Ti. K\{r, r, ai) = 
if (r == true) 

K(r, r, (Ti) 
else 

f[<IncOrExpr>]7K(Ji 

endif 

Bitwise and Boolean Expressions . The following expression all return boolean 
results, with the exception of the first three (and, or and xor) which perform bit- 
wise operations on integral operands and logical operations on boolean operands. 
The comparison expressions can work with any operands of compatible types 
and thus require a more extensive definition. The results of these operations 
are rather complex for floating point values, and have been defined in tables to 
simplify the presentation. Shift and comparison operations that a similar have 
been removed from this presentation for space consideration. 

f [<IncOrExpr>|7K(T::= 
f [<XORExpr>]7K(j 

I f [<IncOrExpri> | <XORExpr>]7Kcr = f|[<IncOrExpri >|7Kicr where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = 5 [<XORExpr>| 7 K 2 cri where 
Vr2,r2,(T2. K2(r2, T2, (Ti) = K{n or± r2,r, (T2) where 
if (n == “Z”) 

T = “Z” 
else 
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T = binaryPromoteType(ri , T2) 
endif 

f [<XORExpr>]7«:(j::= 
f [<AndExpr>]7«:(T 

I f [<XORExpri> ^ <AndExpr>|7K(j = f [<XORExpri>|7Ki(T where 
Vri,ri,(Ti. Ki(ri, Ti, (Ti) = 5 [<AndExpr>] 7 K 2 cri where 
Vr2,T2,(T2. K2(r2, T2, (Ti) = «:(ri xor^ r2,r, (72) where 
if (n == “Z”) 

T = “Z” 
else 

T = binaryPromoteType(ri , 72) 
endif 

f [<AndExpr>]7«:(T::= 
f [<EqualExpr>|7K(T 

I f|[<AndExpri > & <EqualExpr>]7Kcr = £l|<AndExpri >]7 Kict where 
Vri,Ti,(Ti. Ki(ri, Ti, (Ti) = f [<EqualExpr>]7«:2cri where 
Vr2,T2,(T2. K2(r2,T2,(Ti) = n(ri andi 72,7,0-2) where 
if (71 == “Z”) 

7 = “Z” 
else 

7 = binaryPromoteType( 7 i , 72) 

endif 

5 [<EqualExpr>] 7 Ko- ::= 
f [<RelatExpr>]7Ko- 

I f [<EqualExpri > == <RelatExpr>|7Ko- = 

5 |[<EqualExpri>] 7 Kio- where 

Vri, 71 , 0 - 1 . Ki(ri, 71 , 0-1) = f [<RelatExpr>]7«:20-i where 
Vr 2 , 72 ,o- 2 . «2 (72, 72 , 0-1) = K(g, boolean, 0-2) where 
if (isNumeric( 7 i)) 

let (7 = binaryPromoteType(7i , 72)) and 
r'l — promote( 7 , (71 , 7 i)) and 
72 = promote( 7 , (72,72)) in 
if (7 = “F”or 7 = “D”) 

if (r'l == NAN or 73 == NAN) 
t — false 

else if (I r’l |== 0 and | r’^ |== 0 ) 
t = true 
else if (r'l == 72) 
t = true 
endif 

else if (r'l == r'2) 

t = true 

endif 

else if (7 == “Z”) 
t = (n == 72) 
else // must be ref types 
endif 
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I £[<EqualExpri > != <RelatExpr>|7Kcr = 

£[!(<EqualExpri > == <RelatExpr>)|7K(j 

f|[<RelatExpr>|7K(j ::= 
f|[<ShiftExpr>|7fi:(j 

I £[<RelatExpri> < <ShiftExpr>|7K(j = £|<RelatExpri>|7Kicr where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = £^[<ShiftExpr>|7K2cri where 
Vr2,r2,(T2. K2(r2, T2, (Ti) = K{q, “Z",(T2) where 
let (r = binaryPromoteType(ri , T2)) and 
rj = promote(r, (ri,ri)) and 
r'2 — promote(r, (r2,T2)) in 
q = r'i <T r'2 

similar for >, <= and >=, 

with the understanding that positive and negative 0 are equal. 

I f|[<RelatExpri> instanceof <RefType>]7fi;(T = f|[<RelatExpri>|7Ki(Ji where 
Vri,ri,(Ti. Ki(n, n, (Ji) = 
f |<RefType>|7K2cri where 

Vr2,r2,(T2. K2(r2, T2, (Ti) = K{'y.instanceof{Ti,T2), “Z",ct2) 



we compute the result of ri <r V2 using the following table 



Computation of ri <r V 2 





ri 






NAN 


00 


— 00 


1 0 1 


other 




NAN 


false 


false 


false 


false 


false 




00 


false 


false 


false 


false 


false 


r2 


— (X) 


false 


true 


false 


true 


true 




1 0 1 


false 


true 


false 


false 


ri <T± T2 




other 


false 


true 


false 


ri <T± T2 


ri <r_L r2 



where +t-u is normal addition 
(using either IEEE 754, or twos complement arithmetic) 

32 or 64 bit computation is based on the value of r 

this is a strict extension of normal addition 

IEEE underflow or overflow returns an 00 or a 0 value 

twos complement overflow or underflow returns the low order bits of the result 

Numeric Expressions. The numeric expressions take numeric operands and pro- 
duce numeric results. Again, the use of floating point values greatly complicates 
the specification of operations such as addition and multiplication, and thus 
are defined in tables to simplify the presentation. We also define subtraction 
in terms of addition. There is an oversight in the JLS Q involving multiplica- 
tion of infinity values. Consistent with the JDK we define 00 * 00 == 00 and 
—00 * 00 == 00 * —00 == —00. 

f|[<ShiftExpr> | 7 «:cr ::= 
f|[<AddExpr>] 7 fi;(T 

I f|[<ShiftExpri> << <AddExpr>] 7 K(T = f |<ShiftExpri>| 7 Kicr where 
Vri,ri,(Ti. Ki(ri,ri,(Ti) = £|[<AddExpr>| 7 K 2 cri where 
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Vr2, T2, (T2. K2(t2,T2, (ti) = n{q, t[,(J2) where 
let t[ = unaryPromoteType(ri ) and 
T2 = unaryPromoteType(r2) and 
rj = promote(r{ , (ri, ri)) and 
r'2 = promote(r2, (r2,T2)) in 
q = leftShift((ri , r{), {r'2, t^)) 
similar for >> and >>> 

f|[<AddExpr> ]7«:cr ::= 
f|[<MultExpr>]7K(j 

I f|[<AddExpri > + <MultExpr>]7Kcr = f|[<AddExpr>]7Ki(T where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = f[<MultExpr>|7«:2cri where 
Vr2,r2,(T2. K.2{r2,T2,ai) = K{q,T,(j2) where 

if (ri == “Ljava.lang. String;” or T2 == “Ljava.lang. String;”) 
T = “Ljava.lang. String;” and 
q = ri+T V2 
else 

T = binaryPromoteType (ri,T2) and 
let r'l = promote(r, (ri,ri)) and 
r2 = promote(r, (r2,T2)) in 

Q = (r'l +r r'2) 

endif 

I f|[<AddExpr> - <MultExpr>]7Kcr = £l|[<AddExpr>]7Ki(T where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = f[<MultExpr>|7«:2cri where 
Vr2, T2, (T2. K2{r2,T2, (Ti) = k((J, t, (T2) where 
T = binaryPromoteType (ri,T2) and 
let r'l — promote(r, (ri , ri)) and 
f'2 = promote(r, (r2,T2)) in 
9 = ((»'i,Ti) +T i-r' 2 ,r 2 )) 

where we define (ri, ri) +r {r2, T2) = 
if (r == “Ljava.lang. String;”) 

String(n, n) + String(r2, T2) 
else 

compute the result using the following table 
endif 



Computation of ri +t r-2 



1 


ri 




NAN 


(X) 


— (X) 


0 


-0 


other 


ri 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


(X) 


NAN 


(X) 


NAN 


00 


(X) 


(X) 


— (X) 


NAN 


NAN 


— (X) 


— (X) 


— (X) 


— (X) 


0 


NAN 


(X) 


— (X) 


0 


0 


ri 


-0 


NAN 


(X) 


—00 


0 


-0 


ri 


other 


NAN 


(X) 


— (X) 


ri 


ri 


ri +rJ_ r2 



where +t± is normal addition 
(using either IEEE 754 , or twos complement arithmetic) 
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32 or 64 bit computation is based on the value of r 

this is a strict extension of normal addition 

IEEE underflow or overflow returns an cx3 or a 0 value 

twos complement overflow or underflow returns the low order bits of the result 

f [<MultExpr> lyKcr ::= 
f[<UnaryExpr>|7K(T 

I f [<MultExpri> * <UnaryExpr>]7K(T = f |<MultExpri>|7Kicr where 
Vri,ri,cTi. Ki(ri, ri, (Ti) = f [<UnaryExpr>]7K2cri where 
Vr2, T2, (T2. K2(t2,T2, (Ti) = k((J, t, (T2) where 
r = binaryPromoteType (ri,T2) and 
let r[ — promote(r, (ri,ri)) and 
r'2 = promote(r, (r2,T2)) in 
q = {r[ r'2) 

I f [<MultExpri> / <UnaryExpr>]7K(T = f [<MultExpri>|7Ki(T where 
Vri,ri,cTi. Ki(ri, ri, (Ti) = f [<UnaryExpr>]7K2cri where 
Vr2, T2, (T2. K2{r2,T2, cF\) = K(q, T, (T2) where 
T = binaryPromoteType (ri,T2) and 
let r'l — promote(r, (ri,ri)) and 
r2 = promote(r, (r2,T2)) in 
if (I r'2 \== 0 ) 

0(71, U3) where 

6 = 'y ,[&ithrow] and 

U3,r3,T3 = a.mkException{ArithmeticException) and 
71 = 'y[&ithrown ^ [r^, T3)] 

else 

q = {r'^/rr'2) 

endif 

I f [<MultExpri> % <UnaryExpr>]7K(j = f [<MultExpri >]7Kicr where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = f [<UnaryExpr>]7K2cri where 
Vr2, T2, 02- K2(r2,T2, (Ti) = n{q, T, (T2) where 
T = binaryPromoteType (ri,T2) and 
let r'l — promote(r, (ri,ri)) and 
r'2 — promote(r, (r2,T2)) in 
q = (r'i%rr' 2 ) 

where we compute ri *t r2 using the following table 



Computation of ri r 2 



1 


ri 




NAN 


00 


—00 


0 


-0 


other 


?"2 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


(X) 


NAN 


(X) 


— (X) 


NAN 


NAN 


(s)oo 


—00 


NAN 


— (X) 


(X) 


NAN 


NAN 


(s)oo 


0 


NAN 


NAN 


NAN 


0 


-0 


( s )0 


-0 


NAN 


NAN 


NAN 


-0 


0 


( s )0 


other 


NAN 


(s)oo 


(s)oo 






ri *^_L V2 



where *tU is normal multiplication 
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(using either IEEE 754, or twos complement arithmetic) 

32 or 64 bit computation is based on the value of r 

this is a strict extension of normal addition 

IEEE underflow or overflow returns an oo or a 0 value 

twos complement overflow or underflow returns the low order bits of the result 
(s) represents the sign of the result which is positive if both 
ri and r 2 have the same sign and negative otherwise 

where we compute r\j tTi using the following table 



Computation of rij 2 



1 


ri 




NAN 


(X) 


— (X) 


0 


-0 


other 


T2 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


(X) 


NAN 


NAN 


NAN 


0 


-0 




— (X) 


NAN 


NAN 


NAN 


-0 


0 




0 


NAN 


(X) 


— OO 


NAN 


NAN 


(s)oo 


-0 


NAN 


— (X) 


00 


NAN 


NAN 


(s)oo 


other 


NAN 


(s)oo 


(s)oo 


(s)0 


(s)0 


ri/r±r2 



where /t± is normal division 
(using either IEEE 754, or twos complement arithmetic) 

32 or 64 bit computation is based on the value of r 

this is a strict extension of normal addition 

IEEE underflow or overflow returns an 00 or a 0 value 

twos complement overflow or underflow returns the low order bits of the result 
(s) represents the sign of the result which is positive if both 
ri and r 2 have the same sign and negative otherwise 

where we compute ri%T-r 2 using the following table 



Computation of ri%T-r2 



1 


ri 




NAN 


OO 


— OO 


0 


-0 


other 


r2 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


OO 


NAN 


NAN 


NAN 


0 


-0 


ri 


—00 


NAN 


NAN 


NAN 


0 


-0 


ri 


0 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


-0 


NAN 


NAN 


NAN 


NAN 


NAN 


NAN 


other 


NAN 


NAN 


NAN 


0 


-0 





where %ti_ is integer division 

(using C/C++ style remainder, or twos complement arithmetic) 

Does NOT follow IEEE 754 remainder operation, 
rather C/C++ style integer remainder operation 
Floating point underflow or overflow returns an cxa or a 0 value 
twos complement overflow or underflow returns the low order bits of the result 
(s) represents the sign of the result which is positive if both 
ri and T 2 have the same sign and negative otherwise 
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4.14 Location Expressions 

All Java expressions retnrn either a value, variable or void (for method invocations 
that return no value). Unary and primary expressions are the only expressions that can 
return a variable (location in a store). As such, we use the location semantic function 
to evaluate unary and primary expressions. However, to maintain consistency in the 
grammar, we have included regular expression productions and semantics interleaved 
with the location expressions. 

Unary Expressions. Unary expressions involve changing the sign or type of an ex- 
pression, or incrementing or decrementing a value. In the case or pre or post increment 
or decrement operations the expression has a definite side-effect on the store, as is 
indicated in the semantics. Note that the return value of the expression indicates the 
pre or post nature of the expression. If the unary (or primary) expression does not 
return a variable, then the value undef is returned. 

f [<UnaryExpr>]7K(j = H|<UnaryExpr>|7acr where 
Vr, r, I, ai. a{r, r, I, ai) = n{r, t, ui) 



£|[<UnaryExp>]7a(T = 

H|[<PreIncExpr>|7K(T 
I £|[<PreDecExpr>]7K(j 
I ll|<UnaryExprNotPlusMinus>|7K(j 

I C\+ <UnaryExpri >|7acr = £|<UnaryExpri >|7aicr where 
Vr, T, I, a\. ai(r, r, l,a\) = a{r, r, undef, a\) 

I C\- <UnaryExpri>]7acr = £|[<UnaryExpri >]7aicr where 
Vr, r, I, a\. ai(r, r, l,a\) = a (0 —tX r, r, undef, ai) 

H[<UnaryExprNotPlusMinus> ]7«:cr ::= 

£[<PostExpr>|7K(j 
I £|[<CastExpr>|7«:(j 

I 5 |[~ <UnaryExpr>]7K(T = H|[<UnaryExpr>]7acr where 
Vr, T, I, ai. a{r, r, I, ai) = 

let ri = unaryPromoteType(r) and 
ri = promote(ri, (r, r)) in 
K((-ri) - l,r, ai) 

I £\\ <UnaryExpr>]7K(j = H|[<UnaryExpr>|7acr where 
Vr, T, I, (Ti. ai(r, r, l,cr\) = 
if (r == true) 

K(false, r, 02) 
else 

«:(false, r, (T2) 
endif 

We have reverted to the non-LALR(l) grammar for cast expressions to simplify the 
presentation of the semantics. Specifically, the return type of the expression is the type 
of the cast (given that no error occurs), and the return value is the converted value of 
the expression. 
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£[<CastExpr>|7Ka ::= 

f|[( <PrimType> <Dims>^ ) <UnaryExpr>]7fi;(T = f |<UnaryExp>|7Aci(j where 
let t' = T|<PrimType>]7 and 
d = fst(V|[<Dims>|7) and 
r = mkArrayType(r' , d) in 
Vr, Ti , (T.Ki (r, ri , sto) = /« (ri , r, a) where 
n = cast(r, (r, n)) 

I f|[( <RefType> ) <UnaryExprNotPlusMinus>]7fi:(T = 
f [ < U nary ExprN ot PlusMinus >|7 aci(j where 
let r = T[<RefType>|7 in 
Vr, Ti , cr.Ki (r, ri , sto) = 

if (not {env.assnCompatible{T,Ti) or 
env.assnCompatible{Ti , r))) 

0(71,0-2) where 

6 — 'y ,[&ithrow] and 

o'2,r2,T2 — (j.mkException{CastConversionException) and 
71 = ')\&Lthrown <— (r2, T2)] 

else 

Ac(ri , r, a) where 

n = cast(r, (r, n)) 

endif 

In the JLS there is discussion that (p)++ is a valid post fix operation (“(p)++ 
can make sense only as a postfix increment of p”). However, in the JDK, any paren- 
thesized expression returns only a value and not a variable. We follow that convention 
here. 

£|[<PostIncExpr> \'yaa w — 

£|<PostExpr> H — h|7Q!o- = £|<PostExpr>]7aio- where 
Vr,n,/, (71. a\(r,T\,l,a\) = 

let r = binaryPromotionType(r, “l”) and 
ri = promote(r, (r, ri)) and 
72 = promote(r, (1, “l")) and 
q = cast(n, (n -l-Ti *'2), t) in 
a(r, ri, undef, ai [/<—(?]) 

£|<PostDecExpr> \'yaa- ::= 

£|[<PostExpr> - -jjaa — £|[<PostExpr>|7ai(7 where 
Qi(r, n, Z, (Ji) = 

let r = binaryPromotionType(r, “l”) and 
ri = promote(r, (r, ri)) and 
72 = promote(r, (—1, “l”)) and 
q = cast(n, (n -|-T± 72), t) in 
q(7, Ti, undef, g\ [I ^ q]) 

£|[<PostExpr>|7a(7 ::= 

£|<Primary>]7a(j 
I £|<Name>|7a(T 
I £|<PostIncExpr>]7a(j 
I £|[<PostDecExpr>|7a(j 
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£[<PreIncExpr> lyau ::= 

C \-\ — h <UnaryExpr>]7acr = £[<UnaryExpr>]7ai(T where 
Vr, n, /, (Ti. ai(r, n, /, cri) = 

let r = binaryPromotionType(r, “l”) and 
ri = promote(r, (r, ri)) and 
T2 ~ promote(r, ( 1 , “l”)) and 
q = cast(n, (n +T± r2, r)) in 
a{q, ri, undef, cri[Z ^ g]) 

£|[<PreDecExpr> |7acr ::= 

C\- - <UnaryExpr>|7acr = 71 |<UnaryExpr>] 7 ai(T where 

let T = binaryPromotionType(r, “l”) and 
ri = promote(r, (r, r)) and 
V2 — promote(r, (— 1 , “l”)) and 
q = cast(n, (n +T± r2, r)) in 
a{q, ri, undef, a\\l ^ g]) 

Primary Expressions. These expressions are the base expressions of the Java langnage 
providing access to variables, fields, methods, arrays and new object instances. 

T[<Primary>|7acr ::= 

T[<PrimaryNoNewArray>|7acr 

I T|<ArrayCreationExpr>]7acr = T|[<ArrayCreationExpr>|7«:(j where 
Vr, r, (Ti.«:(r, r, a\) = a{r, r, undef, ai) 

T[<PrimaryNoNewArray> Jyacr ::= 

T[<Literal>] 7 a(T = 
let (r, r) = V[<Literal>|7 in 
a(r, r, undef, a) 

I T[this]7acr= a{'y[&ithisObjed],j[&ithisClass],undef,a) 

I <Expr> )|7acr = f [<Expr>]7K(j where 
Vr, r, a\.K{r, r, a\) = a(r, r, undef, ai) 

I T|<ClassInstCreationExpr>|7acr 
I T[<FieldAccess>|7acr 

I T|<MethodInv>|7acr = T|[<MethodInv>]7K(T where 
Vr, r, ai.n{r, r, a\) = a{r, r, undef, ai) 

I T[<ArrayAccess>]7a(T 

Array creation expressions are responsible for the creation of a new array of valnes. 
Specifically, they allocate space in the store for the array, and then initialize all of the 
elements of the array based on the defanlt initializer for the array elements. Note that 
if these elements are reference types, they are initialized to null. This expression only 
retnrns a valne, the reference to the array, and not a location. 

T[<ArrayCreationExpr> |7«:cr ::= 

T|[new <PrimType> <DimExprList> <Dims> ]7Kicr = 
f|[<DimExprList>|7K(T where 

Vr, r, ai.Ki(r, r,ai) = K{q, n, (72) where 
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V = fst(V[<Dims>|7) and 
Tp = T|<PrimType>]7 and 
Ti = mkArrayType(r, u) and 
(a2,q) — a\.allocateArray{T\,Tp) 

I f [new <ClassInterfaceType> <DimExprList> <Dims>^]7Kcr = 
f |<DimExprList>]7«:icri where 

sto\ = 7.dassLoader(fst(V[<ClassInterfaceType>]),cr) and 
Vr, T, (Ti.Ki(r, r, (Ti) = K{q, ri, (T2) where 

V = fst(V[<Dims>]7) and 

Tp — T|<ClassInterfaceType>]7 and 
Ti = mkArrayType(r, u) and 
(a2,q) — a\.allocateArray{T\,Tp) 

The following semantics denote field access. Specifically, these semantics look up 
the named field in the enviroment and return the fields location and type. 

T[<FieldAccess> ]7acr ::= 

T|<Primary> . <Id>]7acr = a{r,r,l,a) where 
r — a[l] and 

l,T = 7[V|<Primary>. <Id>|7] 

I T|super . <Id>]7acr = a{r,T,l,a) where 
r — a[l] and 

l,T = 7[V[7[&super] . <Id>]7] 

The following semantics denote the process of invoking a method call. We have sim- 
plified the syntax here from the JLS by just specifying a Name for the method instead 
of separating the primary and super constructs. The concept for this access is detailed 
in the auxiliary functions that search the environment. The result of the environment 
search and retrieval is a function that takes an environment, command continuation 
and a store and returns an answer. These semantics evaluate the arguments, look up 
the function for the specified method and execute that function. 

T[<MethodInv> ]7Kcr ::= 

T|<Name> ( <ArgList>’ )]7 kct = 

T|<ArgList>| 'ynia where 

Vr, r, (Ti. Ki(r, r, (Ti) = m{'y 9 a-i_) where 
sig — getSigs(r) and 

m = 7.getMethoci(fstV[<Name>]7,sig) and 

V72,ct2. 0(72,0-2) = K{'y2[&ireturnVal],'y2[&ireturnType], a2) 

The following semantics are used to specify the creation of an instance of a class 
through the invocation of a new operator. Upon invocation of this operator the class 
needs to be loaded (if it had not been loaded) . Loading involves creation of storage space 
in the store, execution of field initializers for static fields of the class, and execution 
of static constructors for the class. After the execution of these entities, only then is 
the explicit constructor invoked (and its arguments evaluated). Note the inclusion of 
the <ClassBody> construct in the last two productions. These are new as of Java 1.1 
and permit the construction of anonymous classes. For the sake of brevity, we do not 
include their semantics here. These semantics would be the same as the first semantics 
except that the execution method m would be the evaluation of the class body. 
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f [<ClassInstCreationExpr>|7«:(j ::= 

f [new <ClassType> ( <ArgList>’ )]7«:cr = 
f [<ArgList>]7Ki(Ti where 

stoi = 7.cZassLoader(fst(V|<ClassType>|),cr) and 
Vr, r, (T2.Ki(r, r, (T2) = m{'y 9 a 2 ) where 
sig = getSigs(r) and 

m = 7.getMethoci(fstV[<ClassType>]7,sig) and 
V72 ,ct2. 0(72,0-2) = K{'y 2 [&ireturnVal],j 2 [&ireturnType],a 2 ) 

I new <ClassType> ( <ArgList>’ ) <ClassBody> 

I new <InterfaceType> () <ClassBody> 

The array access expressions allow us to dereference an existing array and return 
the location of an element of the array. If that element is a reference type, then the 
location returned is the location that stores the references and not the location of the 
object itself. 

T|<ArrayAccess> Jyaa = 

T|<Name> [ <Expr> jjyaio- = T[<Expr>]7Ko- where 
Vr, T, oi. K{r, r, o-i) = a{q, ri, I, a\) where 
V = fst(V|Name|7) and 
a = 'y.getArrayRef{v) and 
t' = unaryPromoteType(r) and 
I = 'y .get Array Elem{a, promote(r', (r, r))) and 
Ti = 'y.getArrayElemType{a) and 
q = a{l) 

I T[<PrimaryNoNewArray> [ <Expr> [Jyao- = 

T|<PrimaryNoNewArray>|7aio- where 

Vri, ri, Zi, CTi. ar,T,l,ai = T[<Expr>]7Ko-i where 
'ir2,T2,02- nr2,T2,02 = a(g, ri, Z, 0-2) where 
r' = unaryPromoteType(r2) and 
Z = 'y. get Array Elem{r\, ■pvomote{T' , (r2,T2))) and 
n = 'y .get Array ElemType{r\) and 
q = o-(Z) 

The <ArgList> construction allows us to specify a list of expressions. The result 
is a list of pairs of values and types for each of the arguments in the argument list. 

T[<ArgList> \'yKa ::= 

[ T[<Expr>]7K(j ] 

I T[<ArgListi> , <Expr>]7Kcr = T[<ArgListi>]7Kicr where 
Vri,ri,(Ti. Ki(ri, ri, (Ti) = T[<Expr>|7K2cri where 
Vr2, T2, (T2. K2{r2,T2, (T2) = K(q, T, (T2) where 
q — append(ri, r2) and 
r = n + T2 



Dims. The following three productions are used in cast expressions and array creation 
expressions to specify the dimensions of the created array. The information obtained 
from these productions is used to provide a count of the number of array indices and 
any specified dimension sizes. 
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f [<DimExprList> ] 7 «:cr ::= 
f [<DimExpr>| 7 K(j 

I f [<DimExprListi> <DimExpr>] 7 Kcr = f [<DimExprListi>] 7 Ki(T where 
Vri,ri,(Ti. Ki(ri, ri, CTi) = f [<DimExpr>] 7 K 2 cri where 
Vr 2 , T 2 , (T 2 . K 2 {r 2 ,T 2 , (T 2 ) = /i(q, r, (T 2 ) where 
q — append(ri, r 2 ) and 
T = mkArrayType(ri ,1) 

f [<DimExpr> ] 7 «:cr ::= 

£\[ <Expr> ]] 7 Kcr = f|[<Expr>] 7 Ki(T where 
Vr, r, a\. Ki(r, r, a) = K{q, int[], sto) where 
t' = unaryPromoteType(r) and 
q = [promote(r', (r, r))] 

f [<Dims> ] 7 Kcr ::= 

V[[ ]l7 = (Ijnt) 

I V[<Dimsi> [ ]]7 = (v + 1, int) where 
(v,t) = V[<Dimsi >]7 
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A Programmer’s Reduction Semantics 
for Classes and Mixins 
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Abstract. While class-based object-oriented programming languages 
provide a flexible mechanism for re-using and managing related pieces 
of code, they typically lack linguistic facilities for specifying a uniform 
extension of many classes with one set of fields and methods. As a result, 
programmers are unable to express certain abstractions over classes. In 
this paper we develop a model of class-to-class functions that we refer to 
as mixins. A mixin function maps a class to an extended class by adding 
or overriding fields and methods. Programming with mixins is similar to 
programming with single inheritance classes, but mixins more directly 
encourage programming to interfaces. The paper develops these ideas 
within the context of Java. The results are 

1. an intuitive model of an essential Java subset; 

2. an extension that explains and models mixins; and 

3. type soundness theorems for these languages. 

1 Organizing Programs with Functions and Classes 

Object-oriented programming languages offer classes, inheritance, and overrid- 
ing to parameterize over program pieces for management purposes and re-use. 
Functional programming languages provide various flavors of functional abstrac- 
tions for the same purpose. The latter model was developed from a well-known, 
highly developed mathematical theory. The former grew in response to the need 
to manage large programs and to re-use as many components as possible. 

Each form of parameterization is useful for certain situations. With higher- 
order functions, a programmer can easily define many functions that share a 
similar core but differ in a few details. As many language designers and program- 
mers readily acknowledge, however, the functional approach to parameterization 
is best used in situations with a relatively small number of parameters. When 
a function must consume a large number of arguments, the approach quickly 
becomes unwieldy, especially if many of the arguments are the same for most of 
the function’s uses| 

Class systems provide a simple and flexible mechanism for managing col- 
lections of highly parameterized program pieces. Using class extension (inheri- 
tance) and overriding, a programmer derives a new class by specifying only the 

^ Function entry points d la Fortran or keyword arguments d la Common Lisp are a 
symptom of this problem, not a remedy. 
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elements that change in the derived class. Nevertheless, a pure class-based ap- 
proach suffers from a lack of abstractions that specify uniform extensions and 
modifications of classes. For example, the construction of a programming envi- 
ronment may require many kinds of text editor frames, including frames that can 
contain multiple text buffers and frames that support searching. In Java, for ex- 
ample, we cannot implement all combinations of multiple-buffer and searchable 
frames using derived classes. If we choose to define a class for all multiple-buffer 
frames, there can be no class that includes only searchable frames. Hence, we 
must repeat the code that connects a frame to the search engine in at least two 
branches of the class hierarchy: once for single-buffer searchable frames and again 
for multiple-buffer searchable frames. If we could instead specify a mapping from 
editor frame classes to searchable editor frame classes, then the code connecting 
a frame to the search engine could be abstracted and maintained separately. 

Some class-based object-oriented programming languages provide multiple 
inheritance, which permits a programmer to create a class by extending more 
than one class at once. A programmer who also follows a particular protocol 
for such extensions can mimic the use of class-to-class functions. Common Lisp 
programmers refer to this protocol as mixin programming ^^^3, because it 
roughly corresponds to mixing in additional ingredients during class creation. 
Bracha and Cook | designed a language of class manipulators that promote 
mixin thinking in this style and permit programmers to build mixin-like classes. 
Unfortunately, multiple inheritance and its cousins are semantically complex 
and difficult to understand for programmers^As a result, implementing a mixin 
protocol with these approaches is error-prone and typically avoided. 

For the design of MzScheme’s class and interface system we experi- 
mented with a different approach. In MzScheme, classes form a single inheri- 
tance hierarchy, but are also first-class values that can be created and extended 
at run-time. Once this capability was available, the programmers of our team 
used it extensively for the construction of DrScheme ^ Scheme programming 
environment. However, a thorough analysis reveals that the code only contains 
first-order functions on classes. 

In this paper, we present a typed model of such “class functors” for Java 
We refer to the functors as mixins due to their similarity to Common Lisp’s 
multiple inheritance mechanism and Bracha’s class operators. Our proposal is 
superior in that it isolates the useful aspects of multiple inheritance yet retains 
the simple, intuitive nature of class-oriented Java programming. In the following 
section, we develop a calculus of Java classes. In the third section, we motivate 
mixins as an extension of classes using a small but illuminating example. The 
fourth section extends the type-theoretic model of Java to mixins. The last 
section considers implementation strategies for mixins and puts our work in 
perspective. 



^ Dan Friedman determined in an informal poll in 1996 that almost nobody who 
teaches teaches multiple inheritance [pers. com.]. 
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interface Place' . . . 
interface Barrier' . . . 
interface Door' 

extends Place', Barrier' . . . 

class Door^^ extends Object 
implements Door' { 

Room^ EnteriPerson^ p) { . . . } 

} ' 

class LockedDoor*^ extends Door‘S . . . 
class ShortDoor'^ extends Door^^ . . . 




Classes 



Fig. 1. A program determines a static directed acyclic graph of types 





door. Enter{player)) 




room'^ 



Fig. 2. Given a type graph, reductions map a store-expression pair to a new pair 



2 A Model of Classes 

ClassicJava is a small but essential subset of sequential Java. To model its 
type structure and semantics, we use well-known type elaboration and rewriting 
techniques for Scheme and ML FiguresOandOiUustrate our strategy. 

Type elaboration verifies that a program defines a static tree of classes and 
a directed acyclic graph (dag) of interfaces. A type is simply a node in the 
combined graph. Each type is annotated with its collection of fields and methods, 
including those inherited from its ancestors. 

Evaluation is modeled as a reduction on expression-store pairs in the con- 
text of a static type graph. Figure ^demonstrates reduction using a pictorial 
representation of the store as a graph of objects. Each object in the store is 
a class-tagged record of field values, where the tag indicates the run-time type 



244 



Matthew Flatt, Shriram Krishnamurthi, and Matthias Felleisen 



P 

defn 

field 

meth 

arg 

body 

e 



var 

c 

i 

fd 

md 



t 



defn* e 

class c extends c implements i* { field* meth* } 
I interface i extends i* { meth* } 
t fd 

t md ( arg* ) { body } 
t var 

e I abstract 

new c I var \ null | e : c .fd \ e : c .fd = e 
I e.md {e* ) | super = this : c .md {e* ) 

I view t e \ let var = e in e 
a variable name or this 
a class name or Object 
interface name or Empty 
a field name 
a method name 
c I i 



Fig. 3. Classic Java syntax; underlined phrases are inserted by elaboration and 
are not part of the surface syntax 



of the object and its field values are references to other objects. A single re- 
duction step may extend the store with a new object, or it may modify a field 
for an existing object in the store. Dynamic method dispatch is accomplished 
by matching the class tag of an object in the store with a node in the static 
class tree; a simple relation on this tree selects an appropriate method for the 
dispatch. 

The class model relies on as few implementation details as possible. For 
example, the model defines a mathematical relation, rather than a selection 
algorithm, to associate fields with classes for the purpose of type-checking and 
evaluation. Similarly, the reduction semantics only assumes that an expression 
can be partitioned into a proper redex and an (evaluation) context; it does not 
provide a partitioning algorithm. The model can easily be refined to expose more 
implementation details ^3^9. 

2.1 ClassicJava Programs 

The syntax for ClassicJava is shown in FigureH A program P is a sequence 
of class and interface definitions followed by an expression. Each class definition 
consists of a sequence of field declarations and a sequence of method declarations, 
while an interface consists of methods only. A method body in a class can be 
abstract, indicating that the method must be overridden in a subclass before 
the class is instantiated. A method body in an interface must be abstract. As 
in Java, classes are instantiated with the new operator, but there are no class 
constructors in ClassicJava; instance variables are always initialized to null. 
In the evaluation language for ClassicJava, field uses and super invocations 
are annotated by the type-checker with extra information (see the underlined 
parts of the syntax). Finally, the view and let forms represent Java’s casting 
expressions and local variable bindings, respectively. 

A valid ClassicJava program satisfies a number of simple predicates and 
relations; these are described in Figure H For example, the ClassesOnce(P) 
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The sets of names for variables, classes, interfaces, fields, and methods are assumed to be 

mutually distinct. The meta-variable T is used for method signatures of the form (f . . . 

t), V is used for variable lists of the form {var. . .), and F is used for environments mapping 
variables to types. Ellipses on the baseline (. . .) indicate a repeated pattern or continued 
sequence, while centered ellipses (• • •) indicate arbitrary missing program text (without 
straddling a class or interface definition). 



ClassesOnce(P) 

FieldOncePerClass(P) 



Each class name is declared only once 

Vc,c' class c ■ ■ ■ class c' ■ ■ ■ is in P 
Field names in each class declaration are unique 

Wfd,fd' class ■ ■ ■ { ■ ■ ■ fd ■ ■ ■ fd' ■ ■ ■ } is in P — 
Method names in each class declaration are unique 



MethodOncePerClass(P) 

'^md,md' 

class • • • { • • • md (•••){•••]•••• md' ( • • • ) { • • • } • • • } is in P 
InterfacesOnce(P) Each interface name is declared only once 

interface i ■ ■ ■ interface i' ■ ■ ■ is in P 
InterfacesAbstract(P) Method declarations in an interface are abstract 

\fmd^e interface • • • { • • • md ( • • • ) {e} • • • } is in P e i 

Class is declared as an immediate subclass 

c c' ^ class c extends c' ■ ■ ■ { • 
Cp Field is declared in a class 

{c.fd, t) ^p c ^ class c • • • { • • • t fd ■ 
Cp Method is declared in class 

(md, (ti . . .tn >• t), {var-]_ . . . varn)-, e) ^p c 

^ class c • • • { • • • t md (ti vari . . . tn vavn) {e} • 

Interface is declared as an immediate subinterface 

i -<p i' interface i extends • • • • • • { • 

Method is declared in an interface 

{md, {ti . . .tn ^ t), {vari . . . varn), e) ^p i 

^ interface i ■ ■ ■ { • • • t md (ti vari ■ ■ ■ tn varn) {^}' 

Class declares implementation of an interface 

c -^p i ^ class c • • • implements • • • i ■ ■ ■ { • 



-<p 



-^P 



c ^ c' 
^ fd^ fd' 

md 7 ^ md' 
i ^ i' 

3 abstract 

• } is in P 

• • } is in P 

• } is in P 

• } is in P 

• } is in P 

• } is in P 





Class is a subclass 




<p = the transitive, refiexive closure of ^p 


CompleteClasses(P) 


Classes that are extended are defined 




rng(-<p) C dom(-<p)u{Object} 


WellFoundedClasses(P) 


Class hierarchy is an order 




<p is antisymmetric 



ClassMethodsOK(P) Method overriding preserves the type 

\/c,c',e,e',md,T,T',V, V' 

((md, T, V, e) £p c and {md, T' , V' , e') £p c') (T — T' or c c') 

Gp Field is contained in a class 

{c' .fd, t) G^p c 

{c' .fd, t) ^p c' and c' — min{c^^ | c <p c" and s.t. {c" .fd, t') ^p c"} 
Gp Method is contained in a class 

{md, T, V, e) Gp c 

4^ {{md, T, V, e) £p c' and c — min{c^^ | c <p c" and 3e^, s.t. {md, T, V' , e) £p c"}) 
Table continues in Flgure^B 



Fig. 4. Predicates and relations in the model of Classic Java 



predicate states that each class name is defined at most once in the program P. 
The relation associates each class name in P to the class it extends, and the 
(overloaded) €p relations capture the field and method declarations of P. 

The syntax-summarizing relations induce a second set of relations and pred- 
icates that summarize the class structure of a program. The first of these is the 
subclass relation <p, which is a partial order if the CompleteClasses(P) and 
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<p Interface is a subinterface 

<p = the transitive, reflexive closure of ^p 
CoMPLETElNTERFACEs(i^) Extended/implemented interfaces are deflned 

rng(-<p) U rng(-^p) C dom(-<p)U{Empty} 
WELLFoUNDEDlNTERFACEs(i^) Interface hierarchy is an order 

<p is antisymmetric 

<Cp Class implements an interface 

c <Cp i ^ 3c' ,i' s.t. c <p c' and i' <p i and c' -^p i' 
iNTERFACEMETHODSOK(i^) Redeclarations of methods are consistent 
T' ,V,V' 

{md, T, V, abstract) ^p 2 and {md, T' , V' , abstract) ^p i' 

Gp Method is contained in an interface 

{md, T, V, abstract) Gp i 3i' s.t. i <p i' and {md, T, V, abstract) ^p i' 
CLASSESlMPLEMENTALL(i^) Classes supply methods to implement interfaces 
\fi,cc -4<p i {ymd,T,V {md, T, V, abstract) Gp i 3e,V' s.t. {md, T, V' , e) Gp c) 
NoABSTRACTMETHODs(i^, c) Class has no abstract methods (can be instantiated) 

Vmd,T,V,e {md, T, V, e) Gp c e ^ abstract 



(T — T' or i ^p i') 



Ep 



Type is a subtype 
Field or method is in a type 



<p = ^pU^pU <SC p 

Gp = Ep U 



Fig. 5. Predicates and relations continued from Figure^ 



WellFoundedClasses(P) predicates hold. In this case, the classes declared in 
P form a tree that has Object at its root. 

If the program describes a tree of classes, we can “decorate” each class in 
the tree with the collection of fields and methods that it accumulates from local 
declarations and inheritance. The source declaration of any field or method in 
a class can be computed by finding the minimum (he., farthest from the root) 
superclass that declares the field or method. This algorithm is described precisely 
by the Sp relations. The Gp relation retains information about the source class 
of each field, but it does not retain the source class for a method. This reflects 
the property of Java classes that fields cannot be overridden (so instances of a 
subclass always contain the held), while methods can be overridden (and may 
become inaccessible). 

Interfaces have a similar set of relations: the superinterface declaration re- 
lation -<p induces a subinterface relation <p. Unlike classes, a single interface 
can have multiple proper superinterfaces, so the subinterface order forms a DAG 
instead of a tree. The methods of an interface, as described by Gp, are the union 
of the interface’s declared methods and the methods of its superinterfaces. 

Finally, classes and interfaces are related by implements declarations, as 
captured in the ^p relation. This relation is a set of edges joining the class tree 
and the interface graph, completing the subtype picture of a program. A type in 
the full graph is a subtype of all of its ancestors. 
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h 



P 



CLASSESONCE(i^) iNTERFACESONCE(i^) MeTHODOnCEPerClASS (i^) FiELDOnCEPerClASS (i^) 

COMPLETECLASSEs(i^) WELLFoUNDEDCLASSEs(i^) CoMPLETEInTERFACES (i^) WeLLFoUNDEDInTERFACES (i^) 

CLASSFlELDSOK(i^) CLASSMETHODSOK(i^) iNTERFACEMETHODSOK(i^) iNTERFACESABSTRACT(i^) 

ClassesImplementAll(P) P hd defrij =>■ defUj for j ^[l,n] P, [] \~e e ^ e' : t 

where P — defni . . . defrin e. 

hp defni ■ ■ ■ defun e =>■ defn-^ . . . defn^ e' : t 



[prog'] 



hd 



P ht tj for each j G [1,'n.] 



P, c hm methk meth'^ for each fc G [l,p] 



P hd class c 



{ tl fdi 
methi . 



• t-n fd-n 
methp } 



class c 



{ tl fdi . . . tn fdn 
. . . methp } 



[defil‘d] 



P, i hm methj =>■ meth'^ for each j G [1, p] 



[defn'] 



P hd interface i ■ ■ ■ { methi ■ ■ ■ methp } interface i ■ 



{ meth'^ . . . meth' } 



hm 

P ht t P ht tj for j G [1, n\ P,[this : to, vari : ti, . . . varn : in] hs e e . i 
P,to hm t md {tl vari ... in varn) { ^ ^ ^ vnd {ti vari ... in vavn) { } 



P ht c 



NoAbstractMethods(P, c) 



P, r he new c -■ 



where var G dom(P) 

P, P he var ^ var : P{var) 



P h, t P, r he e =► e' : t' {c.fd, t) Gp t' 

[null [get 1 

P, P he null ^ null : t^ ^ P,P \~e e.fd ^ e' : c .fd : t ^ ^ 

P, P \~e e ^ e' : t' (c-fd, t) t' P,P\-sev^e':t 

; ; [set 1 

P, P he e.fd = e^; e : c .fd = e[j : t 

Rules continue in Figure^| 

Fig. 6. Context-sensitive checks and type elaboration rules for Classic Java 



2.2 Classic Java Type Elaboration 

The type elaboration rules for Classic Java are defined by the following judge- 
ments: 



\-p P ^ P' : t P elaborates to P' with type t 
P hd defn ^ defn' defn elaborates to defn' 

P, t hm meth meth' meth in t elaborates to meth' 

P, _r he e => e' : t e elaborates to e' with type t 

P, P \~s e => e' : t e has type t using subsumption 

P ht t t exists 

The type elaboration rules translate expressions that access a field or call a 
super method into annotated expressions (see the underlined parts of Figure^. 
For field uses, the annotation contains the compile-time type of the instance 
expression, which determines the class containing the declaration of the accessed 
field. For super method invocations, the annotation contains the compile-time 
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P, r \~e e ^ e' : t' {md, {ti . . .tn *■ t), {vari . . . vavn), eb) 

P, r \~s ej ^ e'j : tj for j G [ 1 , n] 

P, r he e.md (ei . . . 6n) e .md (e^ . . . e^) : t 



Gp t' 

[call‘d] 



P, P he this this : c' c' -<p c {md., {ti . . .tn ^ t), {var\ . . . varn), eb) Gp c 

P, P hs ej e'j : tj for j G [l,n] eb ^ abstract 

P, P he super. m(i(ei . . . Cn) super = this : c .md(e\ . . . e^) : t 



[super‘s] 



P, P hs e : f 

P, P he view t e ^ e \ t 



[wcast^] 



P ht t 

P, P he abstract abstract : t 



[abs] 



P, P he e =>■ e' : P t <p t' or t £ dom(^p) or t' £ dom(^p) 

^ [ncast ] 

P, P he view t e ^ view t e : t 

P, P he ei =>■ : ti P, P[var : fi] he 62 Co : t 

'/ '/ 

P, P he let var = ei in 62 ^ let var = in €2 : t 



hs, ht 

P, P he e e' : P 



t p f 



P, P e ^ e : t 



.[sub"] 



t G dom(^p) U dom(-<p)U{Object, Empty} 

P ht t 



[type 



Fig. 7. Rules continued from Figure I 



type of this, which determines the class that contains the declaration of the 
method to be invoked. 

The complete typing rules are shown in Figure A program is well- typed if 
its class definitions and final expression are well-typed. A definition, in turn, is 
well-typed when its field and method declarations use legal types and the method 
body expressions are well- typed. Finally, expressions are typed and elaborated in 
the context of an environment that binds free variables to types. For example, the 
get'^ and set'^ rules for fields first determine the type of the instance expression, 
and then calculate a class-tagged field name using Gp; this yields both the type 
of the field and the class for the installed annotation. In the set'^ rule, the right- 
hand side of the assignment must match the type of the field, but this match 
may exploit subsumption to coerce the type of the value to a supertype. The 
other expression typing rules are similarly intuitive. 

2.3 ClassicJava Evaluation 

The operational semantics for Classic Java is defined as a contextual rewriting 
system on pairs of expressions and stores. A store 5 is a mapping from objects 
to class-tagged field records. A field record iF is a mapping from elaborated field 
names to values. The evaluation rules are a straightforward modification of those 
for imperative Scheme Q. 

The complete evaluation rules are in Figure H For example, the call rule 
invokes a method by rewriting the method call expression to the body of the 
invoked method, syntactically replacing argument variables in this expression 
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^ — [ ] I E : c .fd I E : c .fd — e \ v : c .fd = E 

e — ... I object \ E.md{e . . .) | v.md{v . . . E e . . .) 

V — object I null | super = v : c .md(v . . . E e . . .) 



I view t E I let var = E in e 

P h (E[new c], 5) {E[object], S[object\-^{c, •^)]) [new] 

where object ^ dom(<S) and P = {c' .fdt—^nuW \ c <p c' and s.t. {c' .fd, t) c^} 

P h (E\object : c' . fd] , S) (Efij], S) [ 9 ^^] 

where S{object) — (c, P) and J^{c' .fd) — v 

P h (E\object : c .fd = t;], S) '— >■ (Ef-u], S[objecti-^{c, P[c .fd\-^v])]) [sei] 

where S{object) — (c, P) 

P h {E[object.md{vi, . . . I'n)]; S) (E[e[o 6 _ 7 ect/this, vi/vari, . . . Vn / varnf], S) [call] 
where S{object) — (c, J-) and {md, (ti . . .tn *■ t), {vari . . . vavn), e) Gp c 

P h (E[super = object : c' .md{v\, . . . "Un)]) S) [super] 

'— »■ {E[e[object /this, v\/var\, . . . Vn / vavnf], S) 
where {md, (ti . . .tn >■ t), {vari . . . vaVn), e) Gp c' 

P h (E[view t' object], S) '— *■ {E[object], S) [cas/;] 

where S{object) — [c, J-) and c <p t' 

P h (E[let var = u in e], 5) » {E[e[v/var]], S) [let] 

P h (E[view t' object], S) (error: bad cast, S) [xcast] 

where S{object) — {c, J-) and c ■^p t' 

P h (Efnull : c .fd] , S) (error: dereferenced null, S) [nget] 

P h (Efnull : c .fd = ■u], S) '— *■ (error: dereferenced null, S) [nset] 

P h (E[null.md('Ui, . . . I'n)], S) (error: dereferenced null, S) [ncall] 



Fig. 8. Operational semantics for Classic Java 



with the supplied argument values. The dynamic aspect of method calls is im- 
plemented by selecting the method based on the run-time type of the object (in 
the store). In contrast, the super reduction performs super method selection 
using the class annotation that is statically determined by the type-checker. 

2.4 Classic Java Soundness 

For a program of type t, the evaluation rules for Classic Java produce either a 
value that has a subtype of t, or one of two errors. Put differently, an evaluation 
cannot get stuck. This property can be formulated as a type soundness theorem. 



Theorem 1 (Type Soundness). If hp P P' : t and 

P' = defni . . . defun e, then either 

— P' \- {e, 0) (object, S) and ^(object) = {f , T) and t' <p t; or 

— P' h (e, 0) (null, S); or 

— P' h (e, 0) (error: bad cast, S); or 

— P' \- (e, 0) (error: dereferenced null, S). 

The main lemma in support of this theorem states that each step taken in the 
evaluation preserves the type correctness of the expression-store pair (relative 
to the program) Specifically, for a configuration on the left-hand side of an 
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evaluation step, there exists a type environment that establishes the expression’s 
type as some t. This environment must be consistent with the store. 

Definition 2 (Environment-Store Consistency). 

p,r b. 5 

(S{object) = {c,T) 

El'. =^r{object) = c 

E 2 '- and dom(lF) = {c\.fd \ {ci.fd,C 2 ) €p ci} 

E 3 : and rng(lF) C dom(5) U {null} 

E 4 : and (lF(ci./d) = object' and {c\.fd,C 2 ) Sp ci) 

{{S{object') = {c',E')) ^ c' <p C 2 )) 

E^: and object € dom(T) object G dom(5) 

Eg: and dom(5) C dom(T). 

Note that the environment may contain bindings for lexical variables, which are 
not store objects. 

Since the rewriting rules reduce annotated terms, we derive new type judge- 
ments that relate annotated terms. Each of the new rules performs exactly the 
same checks as the rule it is derived from, but does not add any annotation. Thus 
b is derived from bj and so forth. Only the judgement on expressions (b) is al- 
tered slightly: we retain the view operation in all cases and ignore the [wcasf^] 
relation, which is only an optimization that removes an unnecessary check. This 
relaxation obviously does not change the type-checking or extensional behavior 
of any programs. 

The following lemmata are used to prove the main lemma. 

Lemma 3 (Free). If P,F b e : < and a ^ dom(T), then P,P [a : t'] b e : t. 
Proof. This follows by reasoning about the shape of the derivation. □ 



Lemma 4 (Replacement). If P,P b E[e] : t, P,P b e : t', and P,P b e' : 
t' , then P,P b E[e'] : t. 

Proof. This follows by a replacement argument in the derivation tree. □ 



Lemma 5 (Substitution). If P,P [vari : t\, . . . varn : b e : t and {vari, 

. . . yarn} H dom(T) = 0 and P,P \~s Vi : U for i G [l,n], then P,P b e [vi/vari, 
. . . Vn/varn] : t. 

Proof. Let a denote the substitution [vi/ vari, . . . Vn/ varn], and e' = cr(e). The 
proof proceeds by induction over the shape of the derivation showing that P,P a 
b e : t. We perform a case analysis on the last step. 

Case e = new c. P,P ct b e : c and P,P b e' : c. 




A Programmer’s Reduction Semantics for Classes and Mixins 



251 



Case e = var. If var ^ dom((r), then var must be in dom(r'). Thus P,Fa var 
: t iff P,r var : t. Otherwise, var = vari for some i € and P,Pa 
var : ti. But P,P h Vi ■ U and e' = a{e) = a{vari) = Vi, so P,P hi e' : U. 

Case e = null. By [null], any type is derivable. 

Case e = eij_c_.fd. P,P cr hi ei : and (c.fd,t) Gp t' follow from the antecedents. 

By induction, P,P hi a(ei) : t' . Therefore P,P hi cr(ei) : t" , where t" is a 
sub-type of t' . Hence, (c.fd,t) Gp t" . Thus P,P li a{ei)j_c_.fd : t. 

Case e = eij_c_.fd = e-i. This case is similar to the one above. 

Case e = view t t\. P,Pa hi ei : and t <p t' follow from the antecedent. 

Inductively, P,P hi cr(ei) : t" for t" <p t' . If t <p t" or t" <p t, P, P \-e 
view t a{ei) : t (by our relaxed [ncast'^j rule). 

Case e = let var = ei in 62- Let (Ti = a. From [let], we get P,Pai hi ei : p. 
Let (72 be the substitution [var : ti]. Then P,Pa2(ri hi 62 : t. By induction, 
P,P hi (Ji(ei) : ti and P,Pa2 hi (11(62) : t. By using LemmaHfor each term, 
P,P hi (Ti(let var = ei in e) : t. 

Case 6 = eQ.md (ei, ... e„). Typability of the expression implies P,Pa \~s Ci : U 
for i G [l,n] and P,Pa hi cq : to where (met, (ti . . . tn ^ t), {vari , ..., vavn), e) 
€p to- By induction, P,P hi (r(ei) : ti for each e,, and P,P hi a{eo) : to' where 
to' <p to, which implies that {md, (ti . . . tn —>■ t), {vari , ..., vavn), e) Gp to'. 
Thus P,P hi a{eo-md (ei, . . . e„)) : t. 

Case 6 = super = this : c .md (ei, . . . e„). This follows in a similar fashion 
to the rule above. Since the class c is embedded in the expression, and 
the induction yields a subtype of the original type, this can be subsumed 
appropriately to instantiate the method in the superclass. □ 



Lemma 6. If P,P hi E[e] : t, P,P hi e : t', and P,P hi e' : t" where t" <p t' , 
then P,P hi E[e'j : t. 

Proof. The proof is by induction on the depth of the evaluation context E. If E 
is the empty context [] we are done. Otherwise, partition E[e] = Ei[E2[e]j where 
E2 is a singular evaluation context, i.e., a context whose depth is one. Consider 
the shape of E2[»], which must be one of: 

Case • : c .fd. Since c is fixed, »’s type does not matter: the result is the type 
of the field. 

Case • : c .fd = e. Compare to the previous case. 

Case V ■. c .fd = •. Since t" <p t', the type of • is t' by subsumption and the 
type of the expression is unchanged. 

Case •.md{e ...). Since t" <p t' and methods in an inheritance chain must 
preserve the type, the result of method application is the same type. 

Case v.md{v ... • e . . .). By subsumption, arguments have the declared type 
by [meth]; t" can be t' by subsumption. 
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Case super = v : c .md (v . . . Analogous to the previous case. 

Case view t • . The type of this expression is the same regardless of • . Since hi 
has the less restrictive condition for [ncast'^] that t and t" be comparable 
by <p, the typing proceeds even if t" is a subtype of t. 

Case let var = • in 62- We are given P,F hi e : t', so from [let], P,Fai hi 62 : 
ti for some type ti where cti is [var : t']. We must show that P,Pa2 hi 62 : 
ti where (T2 = [var : t"]. This follows from LemmaH 

Definition 7. P <p P' if dom(T) = dom(T') and V v G dom(T), P'{v) <p 

P{v). 



Lemma 8. If P,P \~e e : t and P <r P’ , then P,P' \~s e : t. 

Proof. The proof is a simple adaptation of that of Lemma^ □ 

We can now prove the subject reduction lemma. Since Classic Java does not 
include any primitives, its type soundness follows by induction over this result. 

Definition 9 (Error Configuration). An error configuration is any one of 

[xcast], [nget]., [nset] and [ncall]. 



Lemma 10 (Subject Reduction). If P,P hi e : P,r\~cr S, fv(e) C dom(T), 

and (e,S) ^ {e',S'), then e' is an answer, e' is an error configuration, or 3 P' 
such that 

1. P,P' h e' : t, 

2. p,p' h; S'. 

Proof. The proof examines the structure of the reduction step. For each case, 
we construct the new environment P' and show that, if execution has not halted 
with an answer or in an error configuration, the two consequents of the theorem 
are satisfied relative to the new expression, store, and environment. 

Case [new]. Set P' = P [object : cj. 

1. We have P,P hi E[new c] : t. From As, object ^ dom( 5 ) ^ object ^ 
dom(T). Thus P,P' hi E[new c] : f by LemmaO Since P,P' hi new c : 
c and P,P' hi object : c we use LemmaOto get P,P' hi E[object] : t. 

2 . Let S' {object) = {c,T). object is the only new element in dom( 5 '). 

Ai: P' {object) = c. 

A2: dom(lF) is correct by construction. 

A3: rng(J^) = {null}. 

A4: Since rng(lF) = {null}, this property is unaffected. 

A5 and Eq: The only change to P and S is object. 
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Case [get]. Set F' = F. Let t' be the type such that P,F hj object : d .fd : t' . 
P,F hi E\ object : F .fd] : t implies that P{object) <p c' . Thus S{object) = 
(c,lF) with c' .fd e dom(lF). 

1. If V is null, it can be cast to t' , so P,F' hi E[ii] : t by LemmaH If is 
not null, by S 4 , S(v) = (c",J where c" <p t' . By LemmaH P,F' hi 
E[?;] : t. 

2. S and F are unchanged. 

Case [set], 

1 . The proof is by a straight-forward extension of the proof for [get] . 

2. The only change to the store is a field update; thus only E 3 and E 4 are 
affected. Let v be the assigned value. Assume v is non-null. 

E 3 : Since v is typable, it must be in dom(T). By it is therefore in 
dom(5). 

E 4 : The typing of the active expression indicates that the type of v can 
be treated as the type of the field fd by subsumption. Combining 
this with El indicates that the type tag of v will preserve E 4 . 

Case [call]. From P,F hj object.md{vi, ... Vn) '. t we know P,F h^ object : t' , 
P,F \~s Vi : ti for i in [1, n], and (md, (ti . . . — > t), ('cari, ..., r’ar„), e) Sp 

t' . The type-checking of P proves that P,to hm t md {t\ mri, ... mr„) 
{e}, which implies that P,[this : : ^i> • • • hi e : t where 

to is the defining class of md. Further, we know that t' <p to from £p for 
methods and ClassMethodsOk(P). 

1. Lemma^shows that P,F hi e[o6ject/this, v\/vari^ ... Vn/vaVn] : t. 

2. S' = S and F' does not bind new addresses, so h^ is preserved. 

Case [super]. The proof is essentially the same as that for [call]. 

Case [let]. P,F hi let var = v in e \ t implies P,F v : t' for some type t'. Set 
F' = F [var : t']. From [let], P,F' \~e e : t. 

1. By Lemma^ hi e [v/var] : t. 

2. The store is unchanged and the only addition to the environment is not 
an object, so the store relation holds. □ 



2.5 Related Work on Classes 

Our model for class-based object-oriented languages is similar to two recently 
published semantics for Java QQ, but entirely motivated by prior work on 
Scheme and ML models The approach is fundamentally different from 

most of the previous work on the semantics of objects. Much of that work has 
focused on interpreting object systems and the underlying mechanisms via record 
extensions of lambda calculi or as “native” object calculi (with 

a record flavor) In our semantics, types are simply the names of entities 

declared in the program; the collection of types forms a DAG, which is specified 
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by the programmer. The collection of types is static during evaluatior| and is 
only used for field and method lookups and casts. The evaluation rules describe 
how to transform statements, formed over the given type context, into plain 
values. The rules work on plain program text such that each intermediate stage 
of the evaluation is a complete program. In short, the model is as simple and 
intuitive as that of first-order functional programming enriched with a language 
for expressing hierarchical relationships among data types. 

3 Prom Classes to Mixins: An Example 

Implementing a maze adventure game page 81] illustrates the need for 
adding mixins to a class-based language. A player in the adventure game wanders 
through rooms and doors in a virtual world. All locations in the virtual world 
share some common behavior, but also differ in a wide variety of properties that 
make the game interesting. For example, there are many kinds of doors, includ- 
ing locked doors, magic doors, doors of varying heights, and doors that combine 
several varieties into one. The natural class-based approach for implementing 
different kinds of doors is to implement each variation with a new subclass of 
a basic door class, Dooff. The left side of Figure ^shows the Java definition 
for two simple Dooff subclasses, LockedDooff and ShortDooff. An instance of 
Locked Dooff requires a key to open the door, while an instance of Short Dooff 
requires the player to duck before walking through the door. 

A subclassing approach to the implementation of doors seems natural at 
first because the programmer declares only what is different in a particular door 
variation as compared to some other door variation. Unfortunately, since the su- 
perclass of each variation is fixed, door variations cannot be composed into more 
complex, and thus more interesting, variations. For example, the LockedDoor*^ 
and ShortDooff classes cannot be combined to create a new LockedShortDooff 
class for doors that are both locked and short. 

A mixin approach solves this problem. Using mixins, the programmer declares 
how a particular door variation differs from an arbitrary door variation. This 
creates a function from door classes to door classes, using an interface as the 
input type. Each basic door variation is defined as a separate mixin. These 
mixins are then functionally composed to create many different kinds of doors. 

A programmer implements mixins in exactly the same way as a derived class, 
except that the programmer cannot rely on the implementation of the mixin’s 
superclass, only on its interfaee. We consider this an advantage of mixins because 
it enforces the maxim “program to an interface, not an implementation” page 
11 ]. 

The right side of Figure^ shows how to define mixins for locked and short 
doors. The mixin Locked"^ is nearly identical to the original LockedDooff class 
definition, except that the superclass is specified via the interface Door'. The new 
LockedDoor'^ and ShortDoor'^ classes are created by applying Locked"^ and Short"^ 

Dynamic class loading could be expressed in this framework as an addition to the 

static context. Still, the context remains the same for most of the evaluation. 
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class LockedDoor*^ extends Door‘S { 
boolean canOpen(Person‘^ p) { 
if {\p.hasItem{theKey)) { 

System. out. println( “You don’t have the Key"); 
return false: 

} 

System. out. println( "Using key...” ); 
return super. canOpen(p); 

} 

} 

class ShortDoor'^ extends Door^ { 
boolean canPass(Person‘^ p) { 
if {p.heightO > i) { 

System. out. println( “You are too tall”); 
return false: 

} 

System. out. println( “Ducking into door...”); 
return super. canPass(p); 

} 

} 



/* Cannot merge for LockedShortDoor'^ */ 



interface Door' { 
boolean canOpen{Person^ p); 
boolean canPass(Person‘^ p); 

} 

mixin Locked^ extends Door' { 
boolean canOpen{Person^ p) { 
if {\p.hasItem{theKey)) { 

System. out. println( “You don’t have the Key"); 
return false; 

} 

System. out. println( “Using key...” ); 
return super. canOpen(p); 

} 

} 

mixin Short^ extends Door' { 
boolean canPass(Person‘^ p) { 
if {p.heightO ^ { 

System. out. println( “You are too tall”); 
return false; 

} 

System. out. println( “Ducking into door...”); 
return super. canPass(p); 

} 

} 

class LockedDoor'^ = Locked^(Door'^); 

class ShortDoor'^ = Short^ (Door‘s) ; 

class LockedShortDoor'^ = Locked^(Short^ (Door^^)) ; 



Fig. 9. Some class definitions and their translation to composable mixins 



to the class Door‘s, respectively. Similarly, applying Locked"^ to ShortDoor'^ yields 
a class for locked, short doors. 

Consider another door variation: MagicDoor'^, which is similar to LockedDoor*^ 
except the player needs a book of spells instead of a key. We can extract the 
common parts of the implementation of MagicDoor'^ and LockedDoor'^ into a 
new mixin. Secure"^. Then, key- or book-specific information is composed with 
Secure"^ to produce Locked"^ and Magic'^, as shown in Figure^J Each of the new 
mixins extends Door' since the right hand mixin in the composition. Secure"^, 
extends Door'. 

The Locked'^ and Magic"^ mixins can also be composed to form Locked Magic"^ . 
This mixin has the expected behavior: to open an instance of Locked Magic'^, the 
player must have both the key and the book of spells. This combinational effect 
is achieved by a chain of super. canOpen() calls that use distinct, non-interfering 
versions of neededitem. The neededitem declarations of Locked'^ and Magic"^ do 
not interfere with each other because the interface extended by Locked'^ is Door', 
which does not contain neededitem. In contrast. Door' does contain canOpen, so 
the canOpen method in Locked'^ overrides and chains to the canOpen in Magic"^. 



4 Mixins for Java 

Mixed Java is an extension of Classic Java with mixins. In Classic Java, a 
class is assembled as a chain of class expressions. Specifically, the content of a 
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interface Secure' extends Door' { 

Object neededltem{)\ 

} 

mixin Secure"’ extends Door' implements Secure' { 

Object neededltem{) { return null; } 
boolean canOpen{Person^ p) { 

Object item = neededltem{)\ 
if {\p.hasltem{item)) { 

System. out. println( “You don’t have the ” + item)\ 
return false: 

} 

System. out. println( “Using ” + item 
return super. canOpen(p); 

} 

} 

mixin NeedsKey"’ extends Secure' { 

Object neededItemQ { 
return theKey, 

} 

} 

mixin NeedsSpell"’ extends Secure' { 

Object neededItemQ { 
return theSpellBook; 

} 

} 

mixin Locked"’ = NeedsKey"’ compose Secure"’; 
mixin Magic"’ = NeedsSpell"’ compose Secure"’; 
mixin LockedMagic"’ = Locked"’ compose Magic"’; 
mixin LockedMagicDoor"’ = LockedMagic"’ compose Door"’; 
class LockedDoor*^ = Locked"’(Door‘^); . . . 



Fig. 10. Composing mixins for localized parameterization 




I I I I 

I Locked"’ Magic"’ ^ 

I LockedMagic"’ 

LockedMagicDoor"’ 



Fig. 11. The LockedMagicDoor'^ mixin corresponds to a sequence of atomic mix- 
ins 



class is defined by its immediate field and method declarations and by the decla- 
rations of its superclasses, up to Objectjln MixedJava, a “class” is assembled 
by composing a chain of mixins. The content of the class is defined by the field 
and method declarations in the entire chain. 



We use boldfaced class to refer to the content of a single class expression, as opposed 
to an actual class. 
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Mixed Java provides two kinds of mixins: 

— An atomic mixin declaration is similar to a class declaration. An atomic 
mixin declares a set of fields and methods that are extensions to some in- 
herited set of fields and methods. In contrast to a class, an atomic mixin 
specifies its inheritance with an inheritance interface, not a static connec- 
tion to an existing class. By abuse of terminology, we say that a mixin extends 
its inheritance interface. 

A mixin’s inheritance interface determines how method declarations within 
the mixin are combined with inherited methods. If a mixin declares a method 
X that is not contained in its inheritance interface, then that declaration never 
overrides another x. 

An atomic mixin implements one or more interfaces as specified in the 
mixin’s definition. In addition, a mixin always implements its inheritance 
interface. 

— A composite mixin does not declare any new fields or methods. Instead, 
it composes two existing mixins to create a new mixin. The new composite 
mixin has all of the fields and methods of its two constituent mixins. Method 
declarations in the left-hand mixin override declarations in the right-hand 
mixin according to the left-hand mixin’s inheritance interface. Composition 
is allowed only when the right-hand mixin implements the left-hand mixin’s 
inheritance interface. 

A composite mixin extends the inheritance interface of its right-hand con- 
stituent, and it implements all of the interfaces that are implemented by its 
constituents. Composite mixins can be composed with otter mixins, produc- 
ing arbitrarily long chains of atomic mixin compositions^ 

Figure illustrates how the mixin Locked MagicDoor"^ from the previous 
section corresponds to a chain of atomic mixins. The arrows connecting the tops 
of the boxes represent mixin compositions; in each composition, the inheritance 
interface for the left-hand side is noted above the arrow. The other arrows show 
how method declarations in each mixin override declarations in other mixins 
according to the composition interfaces. For example, there is no arrow from 
the first Secure’" ’s neededitem to Magic'"’s method because neededitem is not 
included in the Door' interface. The eanOpen method is in both Door' and Secure' , 
so that corresponding arrows connect all declarations of eanOpen. 

Mixins completely subsume the role of classes. A mixin can be instantiated 
with new when the mixin does not inherit any services. In Mixed Java, this 
is indicated by declaring that the mixin extends the special interface Empty. 

® Our composition operator is associative semantically, but not type-theoretically. The 
type system could be strengthened to make composition associative — giving Mixed- 
Java a categorical flavor — by letting each mixin declare a set of interfaces for inher- 
itance, rather than a single interface. Each required interface must then either be 
satisfied or propagated by a composition. We have not encountered a practical use 
for the extended type system. 
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defn — mixin m extends i implements i* { field* meth* } 
I mixin m = m compose m 
I interface i extends i* { meth* } 
e — new m | var \ null | e : m .fd \ e : m .fd — e 
I e.md (e*) | super = this .md (e*) 

I view t as t e \ let var = e in e 
m — mixin name 
t — m \ i 

Fig. 12. Syntax extensions for MixedJava 



Consequently, we omit classes from our model of mixins, even though a realistic 
language would include both mixins and classes. 

The following subsections present a precise description of MixedJava. Sec- 
tion describes the syntax and type structure of MixedJava programs, fol- 
lowed by the type elaboration rules in Section ^3 Section ^3explains the op- 
erational semantics of MixedJava, which is significantly different from that of 
ClassicJava. Section^3presents a type soundness theorem, Section^Jbriefly 
considers implementation issues, and Section ^Jdiscusses related work. 



4.1 MixedJava Programs 

Figure^] contains the syntax for MixedJava; the missing productions are in- 
herited from the grammar of Classic Java in Figure 3 The primary change 
to the syntax is the replacement of class declarations with mixin declarations. 
Another change is in the annotations added by type elaboration. First, view 
expressions are annotated with the source type of the expression. Second, a type 
is no longer included in the super annotation. Type elaboration also inserts 
extra view expressions into a program to implement subsumption. 

The predicates and relations in Figure^J (along with the interface-specific 
parts of Figurefl summarize the syntactic content of a MixedJava program. 
A well- formed program induces a subtype relation <p on its mixins such that a 
composite mixin is a subtype of each of its constituent mixins. 

Since each composite mixin has two supertypes, the type graph for mixins 
is a DAG, rather than a tree as for classes. This DAG can lead to ambiguities if 
subsumption is based on subtypes. For example. Locked Magic'" is a subtype of 
Secure'", but it contains two copies of Secure'" (see Figure^J, so an instance 
of LockedMagic'" is ambiguous as an instance of Secure'". More concretely, the 
fragment 



Locked MagicDoor'" door — new Locked MagicDoor'"; 

(view Secure'" door).neededltem{); 

is ill-formed because LockedMagic'" is not viewable as Secure'". The “viewable 
as” relation <p is a restriction on the subtype relation that eliminates ambi- 
guities. Subsumption is thus based on ^p rather than <p. The relations Sp, 
which collect the fields and methods contained in each mixin, similarly eliminate 
ambiguities. 
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MixinsOnoe(P) 

FieldOncePerMixin(P) 

MethodOnoePerMixin(P) 
^md,md' mixin • • • { • • • 
NoAbstractMixins(P) 

-<p 



Each mixin name is declared only once 

Vm,m' mixin m ■ ■ ■ mixin m' • • • is in P 
Field names in each mixin declaration are unique 

yfd,fd' mixin ■ ■ ■ { ■ ■ ■ fd ■ ■ ■ fd' ■ ■ ■ } is in P 
Method names in each mixin declaration are unique 
md - rnd' (••• ) { ■■■ } ■■■ }isinP = 

Methods in a mixin are not abstract 
\fmd^e mixin • • • { • • • md (•••){e}---}isinP i 
Mixin declares an inheritance interface 

m mixin m extends i 

Mixin declares implementation of an interface 
m -4(p ^ mixin m • • • implements ■ ■ ■ i 
Mixin is declared as a composition 

m =p m' o m" mixin m = m' 

Method is declared in a mixin 

{md, (ti . . .tfi >■ t), {vari . . . vavn), e) 

mixin m ■ ■ ■ { • • • t md (ti vari . . . t 
Field is declared in a mixin 

{m.fd, t) ^p m ^ mixin m 



{ ■ 
{ ■ 

compose 

P m 

vaVn) { e } • 



{ • • • t fd ■ 



=>■ m 7^ m' 
^ fd^ fd' 
' md 7^ md' 

^ abstract 

• • } is in P 

• • } is in P 

// • • 7-> 

m IS m P 

• • } is in P 

• • } is in P 



<p Mixin is a submixin 

m <p m ^ m — m or (dm ,m s.t. m —p m o m and (m <p m or m <p m )) 
<p Mixin is viewable as a mixin 

m <p m' 4^ m — m' or {Em" , m'" s.t. m —p m" o m'" and {m" <p m' xor m'" <p m')) 
CompleteMixins(P) Mixins that are composed are defined 

rng(=p) C {m o m' \ m,m' G dom(^p) U dom(=p)} 
Mixin hierarchy is an order 

<p is antisymmetric 

Extended/implemented interfaces are defined 

rng(^p) U rng(^p) U rng(-^p) C dom(^p)u{Empty} 
Mixin extends an interface 



WellFoundedMixins(P) 

CompleteInterfaces(P) 

■Cp 

<5dp 



, i 44- m -4p i or {Em' ,m" s.t. m =p m' o m" and m" ^p i) 



Mixin implements an interface 

PI <Cp i ^ Em' ,i' s.t. m <p m' and i' <p i and {m' ^p i' or m' - 4 <p i) 
Mixin is viewable as an interface 
m <SClp i 44 {Ei' s.t. i <p i' and (m ^p i' or m -^p i')) 

or {Em' ,m" s.t. m =p m' o m" and {m' <SC]p 2 xor m" <SC]p f)) 
MixinCompositionsOK(P) Mixins are composed safely 

\fm,m',m" m =p m' o m" =4- Ei s.t. m' -^p i and m" <<Clp i 
:: and @ Sequence constructors 

:: adds an element to the beginning of a sequence; @ appends two sequences 

P Mixin corresponds to a chain of atomic mixins 

m >-p M 

44 {Ei s.t. m -<p i and M — [’^j) 

or {Em' , m" , M' , M" s.t. m —p m' o m" and m' >p M' 

and m" >p M" and M ^ M'@M") 

Views have an inverted subsequence order 

M M' 44 EM" s.t. M ^ M"@M' 

Table continues in Figure 



Fig. 13. Predicates and relations in the model of Mixed Java 



4.2 Mixed Java Type Elaboration 

Despite replacing the subtype relation with the “viewable as” relation for sub- 
sumption, Classic Java’s type elaboration strategy applies equally well for typ- 
ing MixedJava. The typing rules in Figure^Jare combined with the defn', 
meth, let, var, null, and abs rules from Figure^ 
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MlxlNMETHODSOK(i^) Method definitions match inheritance interface 

VTn,i,e,md,T,T',V,V' 

{{md, T, V, e) m and {md, T' , V' , abstract) Cp i) {T — T' or m-^p i) 
Gp Field is contained in a mixin 

{m' .fd, t) Gp m 

^ 3M,M' s.t. m >p M and {m' .fd, t) ^p m' 

and {m' wM'^ — {m' wM' \ M and 3t' s.t. {m' .fd, t') £p m'} 

Gp Method is contained in a mixin 

{md, T, V, e) Gp m 

^ 3M,m' , M' s.t. m >p M and {md, T, V, e) ^p m' 

and — {m' wM' \ M m'::M' and 3V' ,e s.t. {md, T, V' , e) £p m'} 

MixinsImplementAll(/^) Mixins supply methods to implement interfaces 
ym,im -4(p i {ymd,T {md, T, V, abstract) Gp i 
(3 e s.t. {md, T, V, e) ^p m 
or 3 i' s.t. (m 

and {md, T, V, abstract) Gp f ;)) 





Type is a subtype 


<p = <p U U <^p 




Type is viewable as another type 


“^p = "^p ^ P ^ p 


£p 


Field or method is in a type 


Gp = Gp U Gp 




Mixin selects a view in a chain 






M/m > M' {M'} = | m > 


P M" and M 


./. I> . 


Interface selects a view in a chain 






Mji> M' M' — m\n{m:\M'' 


1 m -<|p i and M mwM"^ 



• /• oc • Method in a sequence is the same as in a subsequence 

m:\Mlmdcc M' m::M — M' or {3i,T,V,M" s.t. m ^p 2 and {md, T, V, abstract) Cp i 

and M' ji > M" and M" j md cx: M') 

• Gp • in • Method and view is selected by a view in a chain 

{md, T, y, e, m::M) in 

{md, T, V , e) ^p m and Mh — max{iV^^ | M^fmd oc M'} 
and — {m::M | m::M<^ Mo and m::M jmd oc Mb 

and 3V' ,e' s.t. {md, T, V' , e') ^p m} 



Fig. 14. Predicates and relations continued from Figure^J 



Three of the new rules deserve special attention. First, the super'^ rule allows 
a super call only when the method is declared in the current mixin’s inheritance 
interface, where the current mixin is determined by looking at the type of this. 
Second, the wcast"^ rule strips out the view part of the expression and delegates 
all work to the subsumption rules. Third, the sub"^ rule for subsumption inserts 
a view operator to make subsumption coercions explicit. 



4.3 Mixed Java Evaluation 

The operational semantics for Mixed Java differs substantially from that of 
Classic Java. The rewriting semantics of the latter relies on the uniqueness 
of each method name in the chain of classes associated with an object. This 
uniqueness is not guaranteed for chains of mixins. Specifically, a composition mi 
compose m2 contains two methods named x if both mi and m2 declare x and 
mi’s inheritance interface does not contain x. Both x methods are accessible in 
an instance of the composite mixin since the object can be viewed specifically 
as an instance of mi or m2. 
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MlXINSONCE(i^) METHODONCEPERMlXIN(i^) iNTERFACESONCE(i^) CoMPLETEMiXINS ( f*) 

WELLFoUNDEDMlXINs(i^) CoMPLETEInTERFACES (i^) WeLLFoUNDEDInTERFACES ( i^) MlXINFlELDSOK(i^) 

MlXINMETHODSOK(i^) iNTERFACEMETHODSOK(i^) InTERFACESAbSTRACt( i^) NoAbSTRACTMiXINS ( f*) 

MixinsImplementAll(P) P hd defrij ^ defn'j for j G [1,'n.] P, [] \~e e ^ e' : t 

where P = defni . . . defun e 

hp defni ■ ■ ■ defun e ^ defn'^ . . . defn'^ e' : t 



[prog"' 



hd 



P ht tj for each j P, m hm methk meth'^ for each k G [l,p] 

P hd mixin m • • • { ti fdi . . . tn fdn mixin m • • • { ti fdi . . . tn fdn 
methi . . . methp } meth'^ . . . meth'p } 



[defn'^] 



P ht m m Empty 



P, P he e e' : m {m' .fd, t) m 



P, P he 



new m ^ new m : m 



P, P he e.fd e' : m' . fd : t 

P, P \~e e ^ e' : m {m' .fd, t) m P, P \~s ^ : t 



[get" 



P, P he e.fd = ^ e' : m' .fd = e'^ : t 



[set^ 



P, P he e {md, (ti . . . tn ^ t), {vari 

P, P \~s €j ^ e'j : tj for j G [1, n] 



), eb) hp t' 

[call'^ 



P, P he e.md (ei . . . Cn) e' .md (e^ . . . e^) : t 

P, P he this ^ this : m m 2 {md, {ti . . .tn ^ t), {vari . . . vavn), abstract) Gp i 

P, r \~s Cj ^ e'j : tj for j h [1, n] 

P, r he super. md(ei . . . Cn) super _ this .md{e\ . . . e^) : t 



[super^ 



P,P he 



e' : t 



— [wcast"’] 



P, P he e e' : t' 



[ncast'^] 



P, P he view t e ^ e' : t'' ^ P, P he view t e ^ view t' as t e' : t 

P, r \~e e ^ e' : t' t' <p t 



P, P he e ^ view t' as t e' : t 



- [sub"^ 



ht 



t h dom(^p) U dom(=p) U dom(^p)U{Empty} 

P ht t 



[type"' 



Fig. 15. Context-sensitive checks and type elaboration rules for Mixed Java 



One strategy to avoid the duplication of x is to rename it in mi and m2. 
At best, this is a global transformation on the program, since x is visible to the 
entire program as a public method. At worst, renaming triggers an exponential 
explosion in the size of the program, which occurs when mi and m2 are actually 
the same mixin m. Since the mixin m represents a type, renaming x in each 
use of m splits it into two different types, which requires type-splitting at every 
expression in the program involving m. 
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Our Mixed Java semantics handles the duplication of method names with 
run-time context information: the current view of an objectj During evaluation, 
each reference to an object is bundled with its view of the object, so that values 
are of the form (object\\view) . A reference’s view can be changed by subsumption, 
method calls, or explicit casts. 



E = [ ] I E : m .fd \ E : m .fd — e \ v : m .fd = E 
e — ... I {object\\M) \ E.md{e . . .) | v.md{v . . . E e . . .) 

V — {object\\M) I null | super = v .md(v ... E e . . .) 



I view t as t E | let var = E in e 

P h (E[new m], S) {E[{object\\M)], S[object\-^{m, [Mi ./(fii— >null, . . . Mn-fdn^nuW])]) [new] 

where object 0 dom(<S) and m >-p M 

{Mi.fdi, . . . = {m'-.-.M'.fd \ M m'-.-.M' } 

and s.t. {m'.fd, t) m' 

P \- (E\{object\\M) : m' .fd], S) (E['(;],5) 

where S{object) — (m, P) and M jm' > M' and P{M' .fd) — v 

P h (E\{object\\M) : m' .fd = r;], S) '— »■ (Ef-u], S' [objecti-^{m, P[M' .fd\-^v])]) [set] 

where S{object) — (m, P) and Mjm' > M' 

P h {E[{object\\M) .md(vi, . . . *5) [caZZ] 

'— *■ {E[e[{object\\M') /this, vi/vari, ... Vn/varn\],S) 

where S{object) — (m, J-) and m >p Mo 

and [md, T, {var-]_ . . . vavn), e, M') Gp M in Mo 

P h (E[super = {object\\m:\M) .md{v\, . . . i^n)]i S) [super] 



where m i and M/i > M" and {md, T, {var-]_ . . . vaVn), e, M') Gp M" in M" 

P \- (Efview t' as t {object\\M)], S) {E[{object\\M')], S) [view] 

where t' ^p t and M/t > M' 

P \- (Efview t' as t {object\\M)], S) '— *■ {E[{object\\M")], S) [cast] 

where t' t and S{object) — (m, P) and m <p t and m >p M' and M' jt > M" 

P h (E[let var = in e] , S) {E[e[v/ var]], S) [let] 

P h (E[view t' as t [object\\M)], S) '— »■ (error: bad cast, S) [xcast] 

where t' t and S{object) — (m, P) and m ^p t 
P h ( E [ n u 1 1 : m .fd] , S) (error: dereferenced null, S) [nget] 

P h (E[null : m .fd = »S) '— »■ (error: dereferenced null, S) [nset] 

P h (E[null.md('Ui, . . . r^n)]; S) (error: dereferenced null, S) [ncall] 



Fig. 16. Operational semantics for Mixed Java 



A view is represented as a chain of mixins. This chain is always a tail of 
the object’s full chain of mixins, i.e., the chain of mixins for the object’s in- 
stantiation type. The tail designates a specific point in the full mixin chain for 
selecting methods during dynamic dispatch. For example, when an instance of 
Locked MagicDoor"^ is used as a Magic"^ instance, the view of the object is 

[NeedsSpell"^ Secure’" Door'"]. 

With this view, a search for the neededitem method of the object begins in the 
NeedsSpell'" element of the chain. 

® A view is analogous to a “subobject” in languages with multiple inheritance, but 
without the complexity of shared superclasses |Q. 
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The first phase of a search for some method x locates the base declaration 
of X, which is the unique non-overriding declaration of x that is visible in the 
current view. This declaration is found by traversing the view from left to right, 
using the inheritance interface at each step as a guide for the next step (via the 
cx and t> relations). When the search reaches a mixin whose inheritance interface 
does not include x, the base declaration of x has been found. But the base dec- 
laration is not the destination of the dispatch; the destination is an overriding 
declaration of x for this base that is contained in the object’s instantiated mixin. 
Among the declarations that override this base, the leftmost declaration is se- 
lected as the destination. The location of that overriding declaration determines 
both the method definition that is invoked and the view of the object (i.e., the 
representation of this) within the destination method body. This dispatching 
algorithm is encoded in the Gp relation. 

The dispatching algorithm explains how Secure'^ ’s canOpen method calls 
the appropriate neededitem method in an instance of Locked MagicDoor'^, some- 
times dispatching to the method in NeedsKey"^ and sometimes to the one in 
NeedsEpell"^ . The following example illustrates the essence of dispatching from 
Secure"^ ’s canOpen: 

Object canOpen(Secure"' o) { ... o.neededItemQ . . . } 

let door — new Locked MagicDoor'^ 
in canOyeni 'vie'w Secure”^ view Locked'^ door) . . . 
canOyeni 'vie'w Secure”^ view Magic'^ door) 

The new Locked MagicDoor"^ expression produces door as an {object\\view) pair, 
where object is a new object in the store and view is (recall Figure^J 

[NeedsKey'^ Secure'^ NeedsSpell"^ Secure'^ Door"^]. 

The view expressions shift the view part of door. Thus, for the first call to 
canOpen, o is replaced by a reference with the view 

[Secure"^ NeedsSpell'^ Secure"^ Door'^]. 

In this view, the base declaration of neededitem is in the leftmost Secure"^ since 
neededitem is not in the interface extended by Secure'^. The overriding declara- 
tion is in NeedsKey"^, which appears to the left of Secure'^ in the instantiated 
chain and extends an interface that contains neededitem. 

In contrast, the second call to canOpen receives a reference with the view 

[Secure'^ Door'^]. 

In this view, the base definition of neededitem is in the rightmost Secure"^ of 
the full chain, and it is overridden in NeedsSpell'^ . Neither the definition of 
neededitem in NeedsKey"^ nor the one in the leftmost occurrence of Secure"^ is a 
candidate relative to the given view, because Secure'^ extends an interface that 
hides neededitem. 
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Mixed Java not only differs from Classic Java with respect to method dis- 
patching, but also in its treatment of super. In MixedJava, super dispatches 
are dynamic, since the “supermixin” for a super expression is not statically 
known. The super dispatch for mixins is implemented like regular dispatches 
with the relation, but using a tail of the current view in place of both the 
instantiation and view chains; this ensures that a method is selected from the 
leftmost mixin that follows the current view. 

Figure ^3 contains the complete operational semantics for MixedJava as 
a rewriting system on expression-store pairs, like the class semantics described 
in Section ^3 In this semantics, an object in the store is tagged with a mixin 
instead of a class, and the values are null and (object\\view) pairs. 

4.4 MixedJava Soundness 

The type soundness theorem for MixedJava is mutatis mutandis the same as the 
soundness theorem for Classic Java as described in Section ^3 To prove the 
soundness theorem, we introduce a conservative extension, MixedJava', which 
is defined by revising some of the MixedJava relations (see Figure 

In the extended language, the subtype relation is used directly for the “view- 
able as” relation without eliminating ambiguities. Thus, MixedJava' allows 
coercions and method calls that are rejected as ambiguous in MixedJava. This 
makes MixedJava' less suitable as a programming language, but simplifies the 
proof of a type soundness theorem. The soundness theorem for MixedJava' 
applies to MixedJava by the following two lemmas: 

1. Every MixedJava program is a MixedJava' program. 

2. P h (e, S) ^ (e'. S') in MixedJava 

^ P h {e, S) ^ (e'. S') in MixedJava'. 

The proof of the soundness theorem is divided into two parts: we first sketch 
the soundness of MixedJava', then show why this result applies to MixedJava. 



Type Soundness of MixedJava'. To prove the soundness of MixedJava', 
we must first update the type of the environment and the environment-store 
consistency relation (h^-) to reflect the differences between ClassicJava and 
MixedJava'. In MixedJava', the environment P maps (object\\M) pairs to the 
mixin type M. The updated consistency relation is defined as follows: 

Definition 11 (Environment-Store Consistency). 

P,P l-a 5 

{S (object) = {m,tF) 

Si: =^m <p r(object) 

S 2 : and dom(P) = {m'::M' .fd \ \m\ and 

3 t {m'.fd,t) €p m'} 

S 3 : and rng(lF) C dom(5) U {null} 

S 3 : and (T(m! ::M' .fd) = object' and 
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< p Type is a subtype 

Extended for views: M <p m ^ M contains m’s sequence; 

M <p i ^ M contains an m s.t. m z 
< p Type is viewable as another type <p = < p 

Gp Field or method is contained in a type 

Choose the leftmost field/method instance 

• /• > • Mixin selects a view in a chain 

Choose the leftmost instance in the chain 

• Gp • in • Method and view is selected by a view in a chain 

Choose the minimum view with a method 



Fig. 17. Revised relations for MixedJava' 



|m| and {m'.fd,t) €p m') 

{{S(object') = {m” ,T')) m" <p t)) 

and {ohject\\ ) G dom(F) object G dom(5) 

Eq: and object G dom(5) {object\\ ) G dom(F). 

The statements of the theorems and lemmata remain unchanged, but the 
proofs must be adjusted for differences between the two languages. We show 
how the subject reduction lemma is updated; the remaining proofs change along 
similar lines. 

To prove the type soundness of Mixed Java', we must establish that field 
accesses and method invocations that have passed the type-checker will not fail 
at run-time. The salient differences in the proof of the subject reduction lemma 
are: 

Case [get]. The typing rules show that P,F hi (objectWM) : m' .fd : t\ where 
{m! .fd,t\) Gp r{{object\\M)). By E 2 , objeci has the field m! .fd. The rest of 
the proof follows as for Classic Java. 

Case [call]. r{{object\\M)) = M combined with [call"^] shows that the method 
is in M . The search algorithm seeks out the base class of the method def- 
inition, and then the leftmost definition of the method in the instantiated 
mixin. Since the search algorithm (• Gp • in •) follows interfaces in both 
directions, we know that the method must exist. Further, both the “down- 
ward” and “upward” searches are type-preserving, since method overrid- 
ing must preserve type (by MixinMethodsOk). Thus, the invoked method 
must exist and must have the same type. The rest of the proof is similar to 
that for Classic Java. 

The proof for the remaining language features is similar to the corresponding 
proofs for Classic Java. 



Relationship Between Mixed Java and Mixed Java'. Since the revised re- 
lations for Mixed Java' are conservative extensions of those for Mixed Java, it 
is easy to see that every Mixed Java program is also a Mixed Java' program. 
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What remains to be shown is that for programs common to both languages, 
their evaluators produce analogous configurations for each reduction step. 

The crucial difference between the languages is, for a given expression, which 
field or method is chosen by the run-time system of each language. Whereas in 
Mixed Java the choice is unique (this is ensured by the “viewable as” relation, 
^p), MixedJava' allows implicit and explicit views that can result in ambiguity, 
and then chooses the leftmost entity (in the linearization) from the set of options. 
These differences are captured in the €p, •/• oc • and • €p • in • relations. 

Since we are only concerned with programs common to the two languages, we 
can ignore programs that select views that result in ambiguity. In the remaining 
programs there is only one field or method to be picked at each stage, which 
is also the leftmost choice. Hence the two evaluators coincide by making the 
same choices. As a result, they compute the same answers, and can be used 
interchangeably for programs common to the two languages. This establishes 
that the type soundness of MixedJava' applies to MixedJava. 

4.5 Implementation Considerations 

The MixedJava semantics is formulated at a high level, leaving open the ques- 
tion of how to implement mixins efficiently. Common techniques for implement- 
ing classes can be applied to mixins, but two properties of mixins require new 
implementation strategies. First, each object reference must carry a view of the 
object. This can be implemented using double-wide references, one half for the 
object pointer and the other half for the current view. Second, method invoca- 
tion depends on the current view as well as the instantiation mixin of an object, 
as reflected in the Gp relation. Nevertheless, this relation determines a static, 
per-mixin method table that is analogous to the virtual method tables typically 
generated for classes. 

The overall cost of using mixins instead of classes is equivalent to the cost 
of using interface- typed references instead of class- typed references. The justi- 
fication for this cost is that mixins are used to implement parts of a program 
that cannot be easily expressed using classes. In a language that provides both 
classes and mixins, portions of the program that do not use mixins do not incur 
any extra overhead. 

4.6 Related Work on Mixins 

Mixins first appeared as a CLOS programming pattern Unfortunately, 

the original linearization algorithm for CLOS’s multiple inheritance breaks the 
encapsulation of class definitions which makes it difficult to use CLOS for 
proper mixin programming. The CommonObjects dialect of CLOS supports 
multiple inheritance without breaking encapsulation, but the language does not 
provide simple composition operators for mixins. 

Bracha has investigated the use of “mixin modules” as a general language for 
expressing inheritance and overriding in objects His system is based on 

earlier work by Cook its underlying semantics was recently reformulated in 
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categorical terms by Ancona and Zucca Bracha’s system gives the program- 
mer a mechanism for defining modules (classes, in our sense) as a collection of 
attributes (methods). Modules can be combined into new modules through var- 
ious merging operators. Roughly speaking, these operators provide an assembly 
language for expressing class-to-class functions and, as such, permit programmers 
to construct mixins. However, this language forces the programmer to resolve 
attribute name conflicts manually and to specify attribute overriding explicitly 
at a mixin merge site. As a result, the programmer is faced with the same prob- 
lem as in Common Lisp, i.e., the low-level management of details. In contrast, 
our system provides a language to specify both the content of a mixin and its in- 
teraction with other mixins for mixin compositions. The latter gives each mixin 
an explicit role in the construction of programs so that only sensible mixin com- 
positions are allowed. It distinguishes method overriding from accidental name 
collisions and thus permits the system to resolve name collisions automatically 
in a natural manner. 

5 Conclusion 

We have presented a programming language of mixins that relies on the same 
intuition as single inheritance classes. Indeed, a mixin declaration in our lan- 
guage hardly differs from a class declaration since, from the programmer’s local 
perspective, there is little difference between knowing the properties of a super- 
class as described by an interface and knowing the exact implementation of a 
superclass. However, from the programmer’s global perspective, mixins free each 
collection of field and method extensions from the tyranny of a single superclass, 
enabling new abstractions and increasing the re-use potential of code. 

While using mixins is inherently more expensive than using classes (because 
mixins enforce the distinction between implementation inheritance and subtyp- 
ing), the cost is reasonable and offset by gains in code re-use. Future work on 
mixins must focus on exploring compilation strategies that lower the cost of 
mixins, and on studying how designers can exploit mixins to construct better 
design patterns. 

Acknowledgements: Thanks to Corky Cartwright, Robby Findler, Cormac 
Flanagan, and Dan Friedman for their comments on early drafts of this paper. 
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Abstract. In this chapter we formally specify a subset of Java Virtual 
Machine (JVM) instructions for objects, methods and subroutines based 
on the official JVM Specification, the official Java Language Specihca- 
tion and Sun’s JDK 1.1.4 implementation of the JVM. Our formal spec- 
ification describes the runtime behaviors of the instructions in relevant 
memory areas as state transitions and most structural and linking con- 
straints on the instructions as a static typing system. The typing system 
includes a core of the Bytecode Verifier and resembles data-flow analysis. 
We state some properties based on our formal specification and sketch 
the proofs. One of these properties is that if a JVM program is statically 
well-typed with respect to the typing system, then the runtime data of 
the program will be type-correct. Our formal specihcation clarifies some 
ambiguities and incompleteness and removes some (in our view) unnec- 
essary restrictions in the description of the official JVM Specification. 



1 Introduction 

The Java Virtual Machine (JVM) is a platform-independent abstract computing 
machine containing an instruction set and running on various memory areas. 
The JVM is typically used as an intermediate machine in the implementation 
of the programming language Java. The official JVM Specification by Lindholm 
and Yellin Q (OJVMS) defines the syntax of the instructions and describes the 
semantics of the instructions in related memory areas. 

This chapter specifies a subset of the instructions for objects, methods and 
subroutines by giving a formal semantics to them. The formal specification is 
based on the OJVMS, Sun’s JDK 1.1.4 implementation of the JVM, in par- 
ticular, the Bytecode Verifier, and the official Java Language Specification by 
Gosling, Joy and Steele Q (OJLS). The formal specification provides a foun- 
dation for exposing the behaviors of the subset of the JVM. Since programs 
of the instructions in the JVM can be used directly over the Web, our formal 
specification defines parts of the security of internet programming in Java. 

The formal specification considers the following essential instructions: the 
load and store instructions for objects and integers, the object creation instruc- 
tion, one operand stack management instruction, several control transfer instruc- 

Jim Alves-Foss (Ed.): Formal Syntax and Semantics of Java, LNCS 1523, pp. 271-^^^ 1999. 
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tions, all method invocation instructions, several return instructions, and the jsr 
and ret instructions for implementing f inally-clauses. 

Many features in the JVM are not considered in this chapter. They are multi- 
threads, arrays, primitive types other than type int, two-word wide data, class 
initialization method <clinit>, access control modifiers, exception handlings, 
native methods, lookupswitch, tableswitch, wide, runtime exceptions, mem- 
ory organization, the overflow and underflow of the operand stack, the legality 
of accesses of local variables, the class file format in details, constant pool reso- 
lution in details and the difference between “static” and “link time” . We assume 
that all classes have been loaded by a single class loader. Due to space constraints 
we only very briefly sketch all proofs in this chapter. 

The paper Q considers a larger subset of JVM instructions, in particular, 
those for exception handling. In addition, it contains the proofs. 

The main ideas of our approach are as follows: 

— We formalize an operational semantics of the instructions by defining each 
instruction as a state transition. 

— At the same time we formulate a static typing system. Based on the typing 
rules in the system, one may try to derive a static type for each memory 
location such that the static type covers the types of all runtime data possibly 
held by the memory location. The typing system characterizes aspects of the 
data-flow analysis (see e.g. []). 

— Our formal specification consists of the state transition machine and the 
typing system. The state transition machine is defined only for programs, 
where static types for all memory locations are derivable with respect to the 
typing system. Practically, the typing system includes a core of the Bytecode 
Verifier. 

— We finally state some properties of the formal specification. In particular, we 
state that if the type inference system can be successful, then the runtime 
data are guaranteed to be type-correct. 

To a large extent, our formal specification follows the OJVMS. However, 
some extensions and changes of the semantics are necessary and desirable. Four 
of them are as follows: 

— The OJVMS (page 130) requires that the static type of an operand stack 
entry or a local variable should be the least upper bound of the types of all 
possible runtime data in it, and the least upper bound should be one JVM 
type. The problem is, however, that the subtyping relation on interfaces 
allows multiple inheritance and thus two interfaces need not have one least 
common superinterface. Our solution is to allow a set of interfaces (and 
classes) to be a static type of an operand stack entry or a local variable. 

— The OJVMS (page 132) uses a special type indicating that an object is new, 
i.e. it has been created by the instruction new but not yet initialized by 
an instance initialization method. We introduce two kinds of special types 
indicating two different stages of object initialization in the specification: 
one indicates that the object is uninitialized; the other indicates that the 
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object is being initialized by an instance initialization method, but has not 
yet encountered the invocation of another instance initialization method. We 
distinguish between these two stages because the objects at different stages 
should be dealt with differently. 

— The OJVMS introduces a concept of subroutines: a jsr instruction jumps 
to or calls a subroutine and a ret instruction returns from a subroutine. 
The mechanism of subroutines is based on the correct use of return ad- 
dresses. The OJVMS defines a new primitive type returnAddress indicat- 
ing that a value is a return address. For the formal specification we refine 
type returnAddress into a family of special types, called subroutine types, 
where a value of a subroutine type is the address of a jsr instruction calling 
a subroutine and thus can be used to compute the return address of the 
subroutine. As we will see, subroutine types are crucial in our specification 
of constraints on jsr and ret instructions. 

— The OJVMS does not clearly distinguish between types for memory loca- 
tions and types for runtime data. Our formal specification clearly distinguish 
between static types for memory locations and types (or tags) of runtime 
data. Therefore, we can formally discuss the type safety property of runtime 
data in the execution. 



In this chapter we use the following notations. 

We use the notation to denote n syntactical objects ai, • • • , an, the nota- 
tion {• • •} a set and define size({ci^}) := n. 

We use {an ot'n}i where ai ^ aj hold for all i, j with 0 < i, j < n and i ^ j, 
to denote a mapping, where the mapping of each ai is o', and the mapping of 
every other element will be defined in each concrete case. In fact, in each concrete 
case, the mapping of every other element will always be either the element itself, 
or a special value failure, or not explicitly defined because it is never used. We 
define T>om{{an Q^n}) := {Sn}- For a mapping 6, we use 6{a) to denote the 
result of the mapping for a, and write 9[a a'\ for the mapping that is equal 

to 0 except it maps a to a' . For a set D, we define 

:= {a 1 -^ 9{a) \ a G 'Dom{9) n D} 

0|_£i := {a I— > 9{a) \ a € Vom(9) — D} 

A list [ao • • • J ctn] with n > — 1 is a special mapping {i ^ ai \ Q < i < n}. 
For any list lis, we define lis + a := lis[size{lis) a]. 



2 Related Work 

Stata and Abadi proposed a type system for a set of instructions focusing on 
subroutines and proved the soundness Q. Since they considered only a few 
instructions, they could provide lengthy proofs and clarify several key semantic 
issues about subroutines. Freund and Mitchell made a significant extension of 
Stata and Abadi’s type system by considering object initialization and in 
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doing this, discovered a bug in Sun’s implementation of the bytecode verifier, 
which allows a program to use an object before it has been initialized. To fix 
the bug, they wrote a typing rule that ensures that at no time during program 
execution, there may be more than one uninitialized object that is created by 
the same new instruction and is usable. 

After realizing the bug discovered Freund and Mitchell, we detected that 
an early version of the current paper contains the same bug. Except for this 
point, the results of the current paper are independent of those by Stata, Abadi, 
Freund and Mitchell. There are several differences between our approach and 
theirs. First, we follow the constraint-solving framework and use typing rules 
to generate constraints that define all legal types. Second, we consider more 
JVM instructions and more details. Two examples are that our approach allows 
an inner subroutine to return directly to an outer jsr instruction in nested 
subroutines, whereas their approach does not, and that upon a subroutine return, 
our approach assigns a type to a local variable using the information on whether 
the local variable is modified in the subroutine, whereas their approach does not 
consider the case. 

Cohen described a formal model of a subset of the JVM, called defensive 
JVM (dJVM), where runtime checks are used to assure type-safe execution Q. 
Our approach is different in that we design a static type inference system, which 
assures that statically well-typed programs do not have runtime type errors. In 
addition, the current dJVM does not consider subroutines, whereas our specifi- 
cation does. 

Goldberg gave a formal specification of bytecode verification Q. Compared 
with our work, he considered array types, but not subroutines. In addition, his 
formal specification is a dataflow analysis and thus closer to the implementation. 

Hagiya presented another type system for subroutines Q. One of the inter- 
esting points in his approach is to introduce a mechanism to distinguish the 
so-called “used” from the “unused” data in a subroutine. His idea is to use a 
kind of special types indicating that a certain memory location in a subroutine 
always has the same content as a memory location at a call to the subroutine. 

The Kimera project is quite successful in testing some running bytecode 
verifiers and detecting some flaws In general, testing is often based on a 
precise specification. Thus a formal specification may be useful for testing. 

Dean Q studied a formal model relating static typing and dynamic linking 
and proved the safety of dynamic linking with respect to static typing. As men- 
tioned before, our formal specification does not consider the issue between static 
typing and dynamic linking. 

Our formal specification considers only one single class loader. Saraswat 
studied static type-(un)safety in Java in the presence of more than one class 
loader . 

Although the JVM uses some structures of the Java language, our type 
system for the JVM resembles data-flow analysis and thus is quite different from 
a formal specification of a type system for Java in e.g. 
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3 JVM Programs, Methods, Data Areas, and Frames 

According to the OJVMS, a byte is 8 bits, and a word is an abstract size that 
is larger than, among others, a byte. One-byte-wide data build instructions, 
whereas one-word wide data represent runtime data. We use byt to range over 
all one-byte data and wrd over all one-word wide data. 

The OJVMS still allows two-word wide integers. But, as mentioned before, 
we do not consider two-word wide data in this chapter for simplicity. 

A (JVM) program in this chapter is defined to contain a set of methods. We 
assume that each method has a unique method code reference. We use cod to 
range over all method code references. An address is a pair (cod, off), where 
off is a one-word wide datum, called a byte offset. For any address (cod, off) 
and another byte offset off) we define {cod, off) + off' := {cod, off + off). An 
instruction may be longer than one byte. A program point, denoted by pp, is the 
starting address of an instruction. Since we do not consider multi-threads, we 
assume that there is just one program count register, which contains the current 
program point. 

As mentioned in the introduction to this chapter, the program point of a 
jsr instruction may be used in computing the returning program point for a 
subroutine. In fact, it is the byte offset of the program point, not the program 
point itself, that may be used, since, as we will see later, a ret instruction, which 
uses the program point of a jsr instruction, is always in the same method as 
the jsr instruction. Thus we may talk about the byte offset of a jsr in the rest 
of this chapter. 

We consider an arbitrary but fixed program Prg. Note that the methods in 
a program may stem from different class files. A method in Prg consists of all 
instructions in Prg whose program points contain the same given method code 
reference. We use Mth to denote an arbitrary but fixed method in Prg. 

We use allPP{Prg) and allPP{Mth) to denote the sets of all program points 
in Prg and Mth, respectively. We assume that allPP{Mth) always contains one 
unique element of the form (_, 0) . Intuitively, it is the starting program point of 
the method. 

We define that the function offset{bytl,byt2) yields {bytl * (2®)) -|- byt2 if it 
is a one-word wide value, a failure otherwise. 

In our specification an object reference is formally a one- word wide datum. 
We use obj to range over all object references. Furthermore, we use null to 
denote a special object reference. 

Following the OJVMS, we formally specify int as the primitive type of all 
one-word wide integers and use val to range over these integers. 

We use cnam, inam, mnam and fnam to range over names of classes, inter- 
faces, methods and fields, respectively. For our formal specification we require 
that fnam is always a qualified name. 

A record is formally a mapping of the form {fnam„ i-^- wrdn}, which maps all 
elements other than fnam^ to a special value failure. We use rec to range over 
all records. 
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The JVM has a heap, from which memory for all objects is allocated. For- 
mally, a heap state is defined as a mapping of the form {objn i— *■ rec„}, which 
maps all elements other than to a special value failure. We use hp to range 
over all heap states. 

A frame is created each time a method is invoked, which contains a local 
variable table and an operand stack for the method. A frame is destroyed when 
the method completes. 

A local variable table state is a list of the form [wrdo, ■ ■ ■ , wrdn] with n > — 1. 
We use Ivs to range over all local variable table states. Each method has a fixed 
number of local variables. 

An operand stack state is a list [wrdo, ■ • ■ j wrdn] with n > — 1. We use stk to 
range over all operand stack states. Each method has a fixed maximal length of 
operand stacks. 

Note that we need not define formally what a frame is, since no frames are 
explicitly used in our specification. 

Each JVM thread has a Java stack to store at least the old current frame 
and a return address upon a method invocation. When the method invocation 
completes normally, the old current frame becomes the current frame and the 
return address becomes the current program point. In this chapter the Java 
stack contains tuples (Ivs, stk, pp), where Ivs is the old current local variable 
table state, stk the old current operand stack state and pp a return address. 
Since we do not consider multi-threads, we need only to consider one Java stack. 
We use jstk to range over all Java stack states. 

A program state is a tuple of the form (pp,jstk, Ivs, stk, hp). We use stat to 
range over all program states. 



4 Static Types 

Figure ^defines all static types. In the static analysis, a memory location at a 
program point may obtain a static type, indicating the types of the runtime data 
that the memory location may hold at that program point in all executions. For 
simplicity, we may omit the phrases “at a program point” and “in all executions” 
in the rest of this chapter. 



Reference type set {re/^} (n > 0) 



Primitive type 
Subroutine type 
Raw object type 



where each re/^ is either type null, or a class or 

interface name as in Java. 

int 

sbr(pp) I invldsbr 
unin(pp, cnam) \ init(cnam) 



Unusable value type unusable 



Figure 1: Static types 
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We introduce the static type null. If a memory location may hold nothing 
more than the special object reference null, then the memory location may be 
given the static type null. 

The static type null and class or interface names in Java are called reference 
types. Note that java. lang. Object (short: Object) is a class name in Java. 

A nonempty reference type set is a static type. Intuitively, if a memory lo- 
cation may hold nothing more than null and objects that are of the reference 
types re/j for i = 1, • • • , n but not raw objects (see below), then it may obtain 
the static type {re/„}. 

It is worth mentioning that the Sun’s implementation does not implement 
the concept of reference type sets in the bytecode verifier. 

In our specification, a single reference type is always regarded as identical to 
the singleton set containing the reference type. 

If a memory location may hold nothing more than elements of the primitive 
type int, then it may obtain the static type int. 

As mentioned before, the byte offset of a jsr instruction can be regarded 
as an element of the subroutine type corresponding to the called subroutine. 
If a memory location may hold nothing more than some valid byte offsets of 
j sr instructions that call one common subroutine starting at pp, then the mem- 
ory location may obtain a subroutine type sbr{pp) as its static type. Note that 
sbr{pp) yf sbr{pp') if and only if pp yf pp' . 

If a memory location may hold some valid and invalid byte offsets of jsr 
instructions, then the memory location may obtain the static type invldsbr. 

The forms unin{pp, cnam) and init{cnam) are static types for memory lo- 
cations holding raw objects. More concretely, if a memory location may hold 
nothing more than objects of the class cnam created by one common new in- 
struction at a program point pp, then the memory location may obtain the static 
type unin{pp, cnam). If the memory location may hold nothing more than an 
object that is being currently initialized by an instance initialization method for 
the class cnam and has not encountered another instance initialization method 
within the current instance initialization method, then the memory location may 
obtain the static type init{cnam). Note that unin{pp, cnam) yf unin{pp' , cnam') 
if and only if pp yf pp' or cnam yf cnam' , init{cnam) yf init{cnam') if and only 
if cnam yf cnam' . 

Any memory location may obtain the static type unusable. In particular, 
if a memory location may hold runtime data of incompatible types, then it 
should obtain the static type unusable, indicating that the content of the memory 
location is unusable in practice. For example, if a local variable may hold an 
object and an element of the type int, then our specification will enforce the 
local variable to obtain the static type unusable. 

To represent the above intuitive semantics more precisely, we define a partial 
order □ on static types as the smallest reflexive and transitive relation satisfying 
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that 

{re/„} 3 {?^e/m} for all n and m with n < m and all re/j, i = 1, . . . , m 

invldsbr 3 sbr(pp) for all pp 

unusable 3 any for all static types any 

The relation any □ any' is read as “any covers any''' . 

Intuitively, if any covers any' , then any instruction applicable to a memory 
location with any is also applicable to a memory location with any' . Note that 
the relation implies that, for example, if any covers both int and a reference 
type ref , then any must be unusable. 

5 Short Notations for Zero or One of Several Static 
Types 

The syntax in Figure^means that an identifier on the left of ::= denotes an 
arbitrary static type or the identifier void that either explicitly occurs or is 
denoted by an identifier on the right of 

Conceptually, the identifier void is not a static type. It is just an auxiliary 
identifier denoting the situation that no static type is present. 

For example, ref denotes an arbitrary class or interface name or null, tys 
denotes an arbitrary reference type set or a primitive type, notnull_void denotes 
an arbitrary class or interface name or void, and any denotes an arbitrary static 
type. 



Class name 


cnam : 


: = 


an arbitrary class name 


Interface name 


inam : 


: = 


an arbitrary interface name 


Reference type 


ref : 


:= 


cnam \ inam \ null 


Primitive type 


prim 


: = 


int 


Void type 


void 






Type that is not null 


notnull 


: = 


cnam \ inam \ prim 


Type 


ty : 


— 


ref 1 prim 


Reference type set 


refs : 


— 


{n > 0) 


Type set 


tys 




refs 1 prim 


Subroutine type 


sbr 


: = 


sbr(pp) 1 invldsbr 


Raw object type 


raw 


: = 


unin{pp, cnam) \ init{cnam) 


Type or void 


notnulljuoid 




notnull 1 void 


Reference type set or 
raw object type 


refsjraw 


. _ 


refs 1 raw 


Reference type set, 
raw object type or 
subroutine type 


refsjraw^sbr 




refs 1 raw \ sbr 


Anything 


any : 


: = 


tys 1 raw \ sbr \ unusable 



Figure 2: Auxiliary symbols denoting zero or one of several static types 
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6 Program Point Types and Program Types 

In general, there is no guarantee that any class file that is asked to be loaded 
is properly formed. Thus according to the OJVMS, the bytecode verifier should 
ensure that the class file satisfies some constraints. In particular, the bytecode 
verifier should be able to statically derive a static type for each local variable 
and operand stack entry at each program point, and ensure that the derived 
static types satisfy some constraints. 

For this purpose, we define a local variable table type as a list of the form 
[anyo, • • • , anyn] with n > — 1. We use Ivsty to range over all local variable table 
types. For Ivsty = [anyo, • • • , any„] and Ivsty = [anyo, • ■ • , any^], we define that 
Ivsty □ Ivsty holds if and only if n = m and anyi □ any' hold for all i = 0, . . . , n. 

We define an operand stack type as a list of the form [anyo, ■ • ■ , anyn] with 
n > —1. We use stkty to range over all operand stack types. For stkty = 
[anyo,- ■ ■ ,anyn\ and stkty' = [any^,- ■ ■ ,anym], we define that stkty □ stkty' 
holds if and only if n = m and anyi □ any' hold for all z = 0, . . . , n. 

The above definitions that a local variable or an operand stack entry can 
hold values of arbitrary static types. 

To record whether an instance initialization method has been called inside 
another instance initialization method, we use three initialization tags, namely 
notinitd, Initd and unknown. We use intag to range over all initialization tags. 
A M-relation is defined on these tags as follows: 

intag □ intag' if and only if intag = unknown or intag = intag' 

We define a program point type as a tuple {lusty, stkty, intag, mod) where mod 
will be defined in Section^^J We use ptty to range over all program point types. 

Let ptty = {Ivsty , stkty , intag , mod) and ptty' = {Ivsty' , stkty' , intag' , mod'). 
The relation ptty □ ptty' holds if and only if Ivsty □ Ivsty' , stkty □ stkty', 
intag □ intag' and mod □ mod' hold, where the last relation will be defined in 
Section 

Intuitively, the relation ptty □ ptty' is used to ensure that any instruction 
that is applicable to all program states of the program point type ptty must be 
applicable to all program states of the program point type ptty' . 

For the program Prg, a program type is a mapping {pp vllPpp I PP G 
allPP{Prg)}. We use prgty to range over all program types. Let prgty and prgty' 
be two program types. Then we define that prgty M Pfgty' holds if and only if 
prgty {pp) 3 P’rgty'{pp) holds for all pp S allPP{Prg). These concepts can also 
be defined for the fixed method Mth. 

7 The Reference Type Hierarchy 

A reference type hierarchy in the JVM is as in Java. Following the OJVMS 
(§ 2.6.4), we formally define a subtyping relation widRefConvert as the smallest 
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reflexive transitive relation on all reference type sets refs satisfying: 

widRefConvert{cnam, cnam') if cnam extends cnam' 
widRefConvert{cnam, inam) if cnam implements inam 
widRefConvert{inam, inam') if inam extends inam' 
widRefConvert{inam, Object) 
widRefConvert{null, ref) 

widRefConvert{{ref^}, {ref'^}) if V(1 < i < n).3(l < j < m). 

widRefConvert{ref ref'j) 

Note that we do not consider array types. We use the relation diSubcls to denote 
the direct subclass relation on classes. 

To constrain the types of the actual and formal parameters in a method 
invocation we define the relation invoConvert on all reference type sets and the 
primitive type int as 

invoConvert := widRefConvert U {{int, int){ 

Note that {{int, int){ is a degenerate case of the widening primitive conver- 
sion in the OJVMS (§ 2.6.2). It suffices for us to have the degenerate case, since 
we consider only one primitive type int. 

To constrain the types of the variable and the value in an assignment, we 
define the relation ass Convert on all reference type sets and the primitive type 
int as 

assConvert := invoConvert. 

The OJVMS requires that assConvert extends invoConvert by some narrowing 
primitive conversions for integer constants. We do not consider this difference 
for simplicity. 

Intuitively, if a reference type set contains both a super- and a subtype, 
then the subtype is redundant. Practically, a Bytecode Verifier could implement 
elimination of redundant reference types from a reference type set with respect 
to a subtyping relation as an optimization step. 

8 Constant Pool Resolution 

According to the OJVMS, each class (or interface) should have a constant pool 
whose entries name entities like classes, interfaces, methods and fields referenced 
from the code of the class (or interface, respectively) or from other constant pool 
entries. An individual instruction in the class (or interface, respectively) may 
carry an index of an entry in the constant pool, and during the execution of 
this instruction, the JVM is responsible for resolving the entry, i.e. determining 
a concrete entity from the entry. This process of resolving an entry is called 
constant pool resolution. 

For our formal specification we introduce some defined functions, called res- 
olution functions, which hide the details of resolution. In fact, except that the 
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resolution processes should take correct sorts of data as argument and yield cor- 
rect sorts of data or a failure as result, other details are not important at all 
for the formal specification and proofs in this chapter. Nevertheless, we still ex- 
plain the definitions of the resolution functions, in order to give a feeling why the 
resolution functions here are proper abstractions of the real resolution processes. 

A resolution function in this section often takes as parameter two one-byte- 
wide integers indl and ind2, which build the index offset{indl, ind2) in a con- 
stant pool. In this sense, a resolution function has always a constant pool as an 
implicit parameter. 

The resolution function cResol{indexl,index2) yields a class name cnam. 
For any cnam, we define a function 

allFields{cnam) := {{fnam, notnull) \ fnam and notnull are the name 
and type of a field in the class cnam} 

Note that a field in the class cnam is either directly defined in the class or in a 
superclass of the class. Since a field name fnam is a qualified name, we need not 
consider the problem with hiding of fields. 

We define a resolution function for a field as 

fResol(indl, ind2) := {fnam, cnam, notnull) 

where fnam is the name of the field, cnam the class containing the field decla- 
ration, and notnull the type of the field. 

We define a resolution function for a special method as 

mResolSp{indl, ind2) := {mnam, cnam, {ty n) notnull jvoid , cod, nlv) 

where mnam is the name of the method, cnam the class containing the decla- 
ration of the method, {ty n) notnull _void the descriptor of the method, cod the 
method code and nlv the length of the local variable table in the method. 

We define a resolution function for a static method as 

mResolSt{indl, ind2) := {mnam, cnam, {tyn)notnull _void , cod, nlv). 

We define a resolution function for an instance methocj as 

mResolV {indl, ind2) := {mnam, cnam, {ty„) notnull jvoid) . 

But the function mResolV {indl, ind2) does not yield a method code. For doing 
this, we need to define another function 

mSelV {obj, mnam, {ty „) notnull jvoid) := {cod, nlv) 

which takes an object obj, and yields the method code cod for the object obj 
and the length nlv of the local variable table in the method. 

^ Thanks to Gilad Bracha for clarifying comments on the semantics of method dispatch 
at this point. 
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We define a resolution function for an interface method as 

mResolI{indl, ind2) := {mnam, inam, {tyn)notnull _void) 

where inam is the name of the interface, instead of the class, that contains the 
declaration of the method. Furthermore, we define 

mSelI{obj, mnam, {tyn)notnulljvoid) := (cod, nlv). 

For convenience we define the auxiliary function 

mInfo{pp) := {mnam, cnam, {tyn)notnull_void, nlv) 

where mnam is the method containing the pp, {tyn)notnull jvoid the descriptor 
of the method, cnam the class containing the declaration of the method, and 
nlv the number of the local variables in the method. 

In order to find out whether a method is an instance or a static method, 
we define the following relations: 

— instMeth{mnam, cnam, {tyn)notnull jvoid) holds if and only the method 
mnam with the signature {tyn)notnull jvoid is an instance method. 

— statMeth{mnam, cnam, {tyn)notnull jvoid) holds if and only the method 
mnam with the signature {tyn)notnull jvoid is a static method. 

9 Constraint Domain and Constraints 

The previous sections have in fact introduced (part of) a constraint domain for 
our formal specification. Although there are no problems to completely formally 
define all concepts in a constraint domain, we can only discuss (part of) them 
informally in this chapter due to the space limit. 

First of all, all data, data structures (e.g. local variable table states, operand 
stack states, program states), static types and type structures (e.g. local variable 
table types, operand stack types, program point types, pogram types) defined in 
the previous sections are elements of the constraint domain. These elements are 
all sorted. Informally, every time when we introduce an identifier to range over a 
kind of data, data structures, static types or type structures, we introduce a sort. 
We use the introduced identifiers also as names of these sorts. So it is possible 
for one sort to have several names. For example, the sort byt consists of all one- 
byte wide data, the sort wrd all one-word wide data, the sort pp all program 
points, the sort Ivs all local variable table states, the sort stk all operand stack 
states, the sort stat all program states, the sorts ref , refs, tys and refsjraw 
corresponding static types, respectively, as defined in Figure^ the sort Ivsty all 
local variable table types, the sort stkty all operand stack types and the sort 
prgty all program point types. Standard data or type structures, e.g. sets or lists 
of some data or types, also build sorts, but not necessarily have a sort name. 

There is a subsort relation among the sorts, which corresponds to the subset 
relation. In particular. Figure ^defines that if a sort occurs as an alternative on 
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the right of then the sort on the left of ::= is a supersort of it. For example, 
the sort ref is a supersort of the sorts cnam and inam and contains null, the 
sort prim contains int, the sort notnull is a supersort of the sorts cnam, inam 
and prim, refs contains all {re/„}, where each re/j is an element of the sort 
ref, etc. Since a singleton reference type set {ref} and the reference type ref are 
regarded as the same static type, we define that the sort refs is a supersort of 
the sort ref. 

For each sort, there are a countable set of variables. In general, the completely 
capitalized version of a sort name denotes a variable of the sort. For example, 
BYT is a variable of byt and WRD a variable of wrd. For notional simplicity we 
also introduce the variable P for the sort pp, L for the sort Ivs, S for the sort 
stk, J for the sort jstk, H for the sort hp, LG for the sort Ivstag, SG for the sort 
stktag, S for the sort stat, LT for the sort Ivsty, ST for the sort stkty, IT for 
the sort intag, M for mod, II for the sort ptty and for the sort prgty. We use 
_ to denote a wildcard variable. 

In general, terms are built using variables, constants and functions in the 
constraint domain. Terms are sorted as usual. A sort of a subsort is always 
a term of a supersort. Every term has a least sort. We will use the partially 
capitalized version of a sort name, where only the first letter is changed into a 
capital letter, to range over all terms of the sort. For example, Pp, Stat and Ptty 
range over the terms in the sorts pp, stat and ptty, respectively. 

A term containing no variables is called closed. In fact, each element in the 
constraint domain is a closed term. 

Logical formulas are built as in First-Order Predicate Logic, where predicates 
take only sorted arguments in the constraint domain. We use q and r to range 
over all logical formulas. 

We use the form q[sif\ to denote a logical formula containing the (occurrences 
of) terms sif. If the forms (/[sH] and q\tf] occur in the same context (e.g. the same 
rule), then Si and U are of the same sort for i = 1, . . . , n, and q[tf] is the logical 
formula obtained from q{sif\ by replacing each Si by U for i = 1, . . . , n. 

A substitution is a finite mapping of the form {A„ s„}, where the sort 

of each term Si must be a subsort of the sort of for all z = 1, • • • , n. We 
consider only closed substitutions in this chapter, i.e. where Si is a closed term 
for z = 1, . . . , zz. We use a to range over all closed substitutions. 

A constraint is a logical formula. A set of constraints {q\, ■ ■ ■ , qm} represents 
the logical formula qi A A qm /\ true. 

A constraint q is satisfied under a substitution a if and only if a{q) is closed 
and holds in the constraint domain. A constraint q is satisfiable if there is a 
substitution, under which the constraint q is satisfied. 

In our formal specification, we may define a function / that yields results 
in a sort a for some arguments and the special value failure not in a for all 
other arguments, and use a term /(sH) in a constraint, say qiffsil)], where a 
term of a is required. Intuitively, this usage always implicitly requires that /(sH) 
should not yield failure. Formally, we may always define a new sort a' , which 
is a supersort of a and contains the failure as a constant, define the / to have 
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the result sort a', and replace the constraint q[f{s^)] by the constraints q[X] 
and X = where X is a new variable of the sort a. The reason why the 

constraint X = f{s^) assures that “/(slf) is not equal to failure’' is that failure is 
not in the sort a and thus X = failure is never satisfiable. (Note that if there are 
two functions yielding failure, then we need to assume that they yield different 
failure’s; otherwise the least sort of the term failure may not exist.) 

Our formal specification consists of two parts. The first part defines a state 
transition relation on program states stat stat' , read as “stat changes into 

stat'” . The relation is defined by state transition rules of the following form: 



Premises 




where Premises is a set of constraints. Let 

Q := TV{Premises) U TV{Ei\s^\ =4> 

Then the rule means that if all constraints in Premises are satisfied under a 
substitution a, then (t(S')[(t(s„)] a{S)[a{tn)] holds. In the sequel, we may 

also say that changes into in the informal discussion for simplicity. 

To specify all program types of a program, the following two forms of con- 
straints are particularly important: 

Prgty(Pp) = Pity and Prgty(Pp) □ Pity 

The former says that the program point type at Pp in Prgty is Pity. The latter 
says that the program point type at Pp in Prgty covers Ptty. If a program point 
Pp can be reached by more than one preceding program point, then it is quite 
convenient to write a constraint of the latter form to constrain the program point 
type at Pp. 

The type system in our formal specification should introduce constraints on 
one program type for the method Mth. Therefore, we require that all typing 
rules contain one common program type variable 
In general, a typing rule is in the form: 



\AC\ 

CC 

SC 

The AC is a set of logical formulas, called applicability conditions, and contains 
a distinguished constraint Mth{P) = Instr. The term Instr gives the form of 
an instruction. Intuitively, AC is used to determine a program point P, where 
the rule can be applied. The identifier CC stands for a set of logical formulas. It 
contains no logical formulas of the form d>{P) □ Ptty. Intuitively, CC constrains 
<?(P). The identifier SC stands for a set of logical formulas of the form (p(Pp') □ 
Ptty, where in most cases Pp' stands for a successor program point. 

The reason for us to write a typing rule in the form as above is that a typing 
rule also suggests an intuitive data-flow analysis step. Roughly speaking, if the 
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data-flow analysis arrives at a program point Pp satisfying AC, in particular the 
constraint Mth{Pp) = Instr in AC, and if the program type at Pp satisfies CC, 
then the program type at each successor program point Pp' should satisfy the 
corresponding constraint in SC. 

Let 

Q := TV{AC) - {^} 

Q' := TV{CC U 5C) - ({<?>} U Q) 

then a typing rule as above formally introduces the constraint 

\fQ.{AC=>3Q' ,{CC\JSC)) 

It is easy to see that the constraint holds if and only if, if AC is satisfied under 
a substitution a with 'Dom{a) = Q U {<?}, then there is a substitution a' with 
Vom{a) = QUQ' U {<!>} such that = o' and a'{CC U SC) hold. 

Let ConstrsMth denote the set of the constraints introduced as above from 
all typing rules. Then we say that the method Mth has a program type prgty, or 
that a program type prgty is a program type of the method Mth, if and only if all 
constraints in ConstrsMth are satisfied under {<? prgty}. Note that a program 
may have more than one program type. For example, a local variable that is not 
used in a method may be given an arbitrary static type in a program type. A 
program is said to be statically well-typed if and only if it has a program type. 



10 The Rules in the Formal Specification 

There are constraints that should occur in many rules. We omit the explicit 
presentation of the following constraints for notational simplicity. 

— The CC in a typing rule always implicitly contains a constraint Pp G 
allPP{Mth) for each <l>{Pp) □ Ptty in the SC. This assures that Pp is 
always a program point, i.e. a starting address of an instruction. 

— In the specification we only consider the instructions for one-word wide data. 
Thus the rules are all based on the assumption that all data in local variables 
and the operand stack are one-word wide. 



10.1 Load and Store Instructions 

The state transitions for loading and storing objects and integers of type int are 
defined by the rules in Figure H The aload and iload instructions load a local 
variable onto the operand stack. The astore and istore instructions store a 
value from the operand stack in a local variable. 

The typing rules for load and store instructions are given in Figure H We 
explain rule (T® to show some of the tricky points in the formulation of con- 
straints. First, REFS JIAW = LT{IND) expresses a membership constraint, i.e. 
that the static type LT{IND) should be in the sort refsjraw, since REFS _RAW 
can only be instantiated by an element in the sort refsjraw. It implies that an 
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Prg{P) — aload IND or iload IND 
S[P,L,S] S[P + 2,L,S + L{IND)] 

Prg(P) — astore IND or istore IND 
S[P,L,S + WRD] S[P + 2,L[IND^ WRD],S] 



(S-l),(S-2) 

(S-3),(S-4) 



Figure 3: The state transitions for load and store instructions 



aload instruction can load both initialized and uninitialized objects. In addi- 
tion, rule (Tfl says that the local variable table type at P -|- 2 (i.e. after the 
instruction) should componentwise cover that at P (before the instruction) . The 
same should also hold for the operand stack type, except that the operand stack 
type at P-l-2 should be extended by the static type of the IND-th local variable. 
A similar constraint should also hold on the components M at P and Mod' at 
P-l-2. The precise definitions of M and Mod' will be given in Sections 
and ^^3 Note that the variables <P and LT in the terms d>{P) and LT{IND) 
are not higher-order (i.e. function) variables, since the terms of this form in this 
chapter can always be regarded as applications of an implicit function app on 
two first-order arguments. 

Similar explanations can be given for the other three typing rules. One point 
that is worth noticing in rule (Tfl is that the variable REFS _RAW _SBR can be 
initiantiated into an element of the sort sbr, whereas the variable REFS _RAW in 
rule (TO cannot. This means that, as required in the OJVMS and implemented 
in the Sun’s implementation, an astore instruction can store a (valid or invalid) 
byte offset, whereas an aload instruction cannot load it. 

An aconst_null instruction loads the reference null. Its state transition rule 
and typing rule are defined in Figure H 

The state transitions for getfield and putfield are defined in Figure 3 
A getfield instruction replaces an object reference at the top of the operand 
stack by the content of a field of the referenced object. 

A putfield instruction stores the content at the top of the operand stack 
into a field of the object referenced by the second top of the operand stack. 

The typing rules for getfield and putfield are given in Figure B The 
sort of the variable REFS in n[ST + REFS] and n[ST + REFS + TYS] as- 
sures that the OBJ in Figure O really references an object. The constraint 
widRefConvert{REFS , CNAM) assures that in Figure H if OBJ G Dom{H), 
then FNAM G Vom{H{OBJ)) holds, i.e. both H{OBJ){FNAM) and H{OBJ) 
[FNAM 1 -^- WRD] are defined and make sense. But the typing rules do not 
ensure that the condition OBJ G Vom{H) in Figure Bholds, since the OBJ 
may hold null at run time. If OBJ ^ Dom{H) holds, then H{OBJ) yields a 
failure. Thus the Premises in both rules in Figure Bare not satisfiable. In fact, 
in this case we would need another state transition rule to describe which kind 
of runtime exception can be thrown. However, as mentioned before, our formal 
specification does not consider this. 
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Mth(P) = aload IND 
<P{P) = n[LT,ST,M] 

REFS.RAW = LT(IND) 

HP + 2) □ n\LT, ST + REFS.RAW, Mod'] 



Mth(P) = iload IND 
<I>(P) = n[LT, ST, M] 
int = LT{IND) 

HP + 2) □ n\LT, ST + int, Mod'] 



Mth{P) — astore IND 
<T(P) = n[LT, ST + REFS. RAW. SBR, M] 

HP + 2) g n[LT[IND ^ REFSJiAW.SBR],ST, Mod'] 



Mth(P) = istore IND 
<I’{P) ^ n[LT,ST + int,M] 

HP + 2) □ n[LT[IND ^ int], ST, Mod'] 

Figure 4: The typing rules for load and store instructions 



Prg{P) = aconstjiull 
S[P,S] S[P + 1, S + null] 

Mth{P) = aconstjiull 

HP) = n[ST] 

HP+i) ^ n[ST + null] 

Figure 5: The state transitions for aconstjiull and bipush 



Prg(P) = getf ield INDl IND2 

(FNAM, ., NOTNULL) = fResol{INDl, IND2) 

WRD = H(OBJ){FNAM) 

S[P,S + OBJ,H] S[P + i,S + WRD,H] 

Prg[P) = putfield INDl IND2 
{FNAM, ., .) = fResol{INDl, IND2) 

RFC = H{OBJ)[FNAM ^ WRD] 

S[P, S + OBJ + WRD, H] S[P + 3, S, H[OBJ ^ RFC]] 



(T-1) 

(T-2) 

(T-3) 

(T-4) 

(S-5) 

(T-5) 

(S-6) 

(S-7) 



Figure 6: The state transitions for getf ield and putfield 
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Mth{P) = getf ield INDl IND2 
<P(P) = n[ST+ REFS] 

(_, CNAM, NOTNULL) = fResol{INDl, IND2) 
widRefConvert{REFS, CNAM) 

<R{P + 3) □ n[ST + NOTNULL] 



Mth{P) = putf ield INDl IND2 
‘P{P) = n]ST + REFS + TYS] 

(_, CNAM, NOTNULL) = fResol{INDl, IND2) 
widRefConvert{REFS, CNAM) 
assConvert{TYS, NOTNULL) 

<I>(P + 3) □ n]ST] 

Figure 7: The typing rules for getf ield and putf ield 



10.2 Object Creation 

A new instruction creates an object. The state transition and typing rules for 
the instruction are defined in Figure^ 



Prg{P) = new INDl IND2 
CNAM = cResol{INDl, IND2) 

OBJ ^ Vom{H) 

Vom{REC) = allFields{CNAM) 

S]P,S,H] S]P + i,S + 0BJ,H]0BJ ^ REC]] 



Mth(P) = new INDl IND2 
<I>{P) = n]LT,ST] 

CNAM = cResol(INDl, IND2) 
umn(P, CNAM) ^ LT 
umn{P, CNAM) ^ ST 

HP + 3) 3 n]LT, ST + unin{P, CNAM)] 

Figure 8: The state transition and the typing rule for new 



The condition OBJ ^ Vom{H) in rule (SB assures that the object reference 
OBJ is new. Rule (Tfl says th^ the operand stack type after the instruction 
covers one with unin{P, CNAM^at the top, which indicates that the operand 
stack may hold an object that has not been initialized by an instance initial- 
ization method <init>, i.e. an uninitialized object. Indeed, a typing rule that 
forbids the use of a memory location with a static type of the form unin{_, _) 
forbids the use of an uninitialized object. 

^ The OJVMS mentions such a type but gives no details on how it can be used in the 
specification. 
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The constraints unin{P, CNAM) ^ LT and unin{P, CNAM) ^ ST assure 
that at the program point P, no new object created by the same new instruction 
at P can be used as a new object. This is strictly weaker than to say that there 
is no new object created by the new instruction at P, since a memory location 
at P is still allowed to hold a new object created by the new instruction at P if 
the memory location has the type unusable. For an example, see Section^] 



10.3 Operand Stack Management Instructions 

We only give the rules for dup in Figure ^ The rules for other instructions are 
similar. 



Prg(P) = dup 

E[P, S + WRD] S[P + 1, S + WRD + WRD] 

Mth{P) = dup 
<P(P) = n[ST + ANY] 

<P(P + 1) ^ n[ST + ANY + ANY] 

Figure 9: The state transition and typing rules for dup 



(S-9) 



(T-9) 



10.4 Control Transfer Instructions 



Prg{P) — if _acmpeq Byn BYT2 

OBJl = OBJ2 ^ P' = P + offset{BYTl, BYT2) 

OBJl A OBJ2 ^ P' = P + S 

S]P, S + OBJl + OBJ2] S]P', S] 

Prg{P) — if_icmpeq BYTl BYT2 

VALl = VAL2 ^ P' = P + offset{BYTl, BYT2) 

VALl A VAL2 ^ P' = P + 3 

S]P,S + VAL1+ VAL2] S'l.S] 

Prg(P) = goto BYTl BYT2 
S]P] S]offset{BYTl, BYT2)] 

Figure 10: The state transitions for control transfer instructions 



(S-10) 



(S-11) 

(S-12) 



All control transfer instructions can be dealt with in a very similar way. We 
consider only a few control transfer instructions. The state transitions for these 
instructions are given in Figure They are quite straightforward. 
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Mth{P) = if_acmpeq BYTl BYT2 
<P(P) = n[ST + REFS + REFS'] 
<P{P+ offset{BYTl,BYT2)) □ n\ST] 
<P(P + 3) □ n[ST] 



Mth{P) = if_icmpeq BYTl BYT2 
<P{P) = n[ST + int + int] 

<P{P+ offset{BYTl,BYT2)) □ n\ST] 
T>(P + 3) □ n[ST] 



Mth(P) = goto BYTl BYT2 
<P(P) = n 

<P(P + offseiBYTl, BYT2)) □ 77 
Figure 11: The typing rules for control transfer instructions 



(T-10) 



(T-11) 



(T-12) 



The OJVMS requires (page 133) that no uninitialized objects may exist on 
the operand stack or in a local variable when a control transfer instruction causes 
a backwards branch. In our specification this requirement is unnecessary, thanks 
to rule (TH . 



10.5 Method Invocation and Return Instructions 

The state transitions for method invocation instructions are defined in Figure^J 
We first consider the state transition rule (S^3 for invokespecial. Since the 
instruction is only used to invoke instance instantiation methods <init> and 
private methods, and to perform method invocations via super, we use the 
function mResolSp. The state transition says that the execution of the invoked 
method starts with a program state, in which the operand stack is empty and the 
local variables hold the object, on which the method is invoked, and all actual 
arguments. We use the notation Ivs^ to denote an arbitrary local variable table 
state with the length n. 

The state transition for invokevirtual (or invokeinterf ace) is similar to 
that for invokespecial. The difference is only that the former uses the func- 
tions mResolV and mSelV (or mResolI and mSell, respectively) to compute 
the method code associated with OBJ , whereas the latter uses the function 
mResolSp to do the same thing, independent of OBJ . Note that the bytes BYT 
and 0 in a invokeinterf ace instruction are useless. They are contained in the 
instruction for historical reasons. 

Invocation of a method leads to the execution of a method code. The typing 
rule in Figure^Jconstrains the program point type at the beginning of a method 
code. The rule is totally independent of method invocation instructions. The rule 
says that the method must be a <init>, an instance or a static method. The 
static types given for the local variables depend on what kind method it is. In 
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Prg{P) = invokespecial INDl IND2 

(-, {TY^)NOTNULL_VOID, COD, NLV) = mResolSp{INDl, IND2) 
S[P, J,L,S+ objTwW^ 

S[COD,J + {L,S,P + OBJ,n^ WRD^],[]] 

Prg{P) = invokevirtual INDl IND2 

{MNAM, (TYn)NOTNULL_VOID) = mResolV{INDl, IND2) 
MNAM 7 ^ < init > 

{COD, NLV) = mSdV{OBJ, MNAM, {TY^)NOTNULL_VOID) 

S[P, J,L,S+ objTWr^ 

S[COD,J + {L,S,P + A),lvs^^'^[Q^ OBJ,n^ WRD„],[]] 

Prg{P) = invokeinterf ace INDl IND2 BYT 0 

{MNAM, {tK)NOTNULL_VOID) = mResolI{INDl, IND2) 

n = BYT - 1 

{COD, NLV) = mSelI{OBJ, MNAM, {TYn)NOTNULL_VOID) 

S[P, J,L,S+ OBJTWR^ 

S[COD,J + {L,S,P + 5),lvs^^'"[0^ OBJ,n^ WRD„],[]] 
Prg{P) — invokestatic INDl IND2 

{-, -, {tK)NOTNULL_VOID, cod, NLV) = mReaolSt{INDl, IND2) 
S\P, J, L, S+ WRD^] 

S[COD, J + {L, S, P+3), WRDi+i | 0 < i < n], []] 

Figure 12: The state transitions for method invocation instructions 



Mth{P) = _ 

^^ = (-,0) 

{MNAM, CNAM, {TYn)NOTNULL_VOID, NLV) = mInfo{P) 

MNAM = < init > 

( NOTNULL.VOID = void A 

( CNAM / Object ^ { LT = unusahle^^'^[Qe^init{CNAM),ne^TY„\ A 
IT = notinitd ) ) A 

( CNAM = Object ^ { LT = unusahle"^"^ CNAM ,ne^TYn\ A 
IT = Initd ) ) 

inatMeth{MNAM , CNAM, {TYn)NOTNULL_VOID) => 

LT = unusable^^^lO CNAM,ne^TYn] 
statMeth{MNAM , CNAM, {TYn)NOTNULL_VOID) => 

LT = unusable^^^ [i TV+i | 0 < i < n] 

<T{P) □ {LT, [],IT,modo) 



(S-13) 



(S-14) 



(S-15) 



(S-16) 



(T-13) 



Figure 13: The typing rule for the starting program point of a method code 
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general, however, each local variable that does not store the object, on which the 
method is invoked, nor an actual parameter, is always given the type unusable. 
This means that the content of such a local variable cannot be used before the 
program explicitly assigns something to the local variable. This means that the 
content of such a local variable cannot be used before the program explicitly 
assigns something to the local variable. We use unusable'^ to denote the list 
[unusable, • • • , unusable] consisting of n unusable. 

In the case of an <init> method, the local variable 0 stores the object being 
initialized. The static type for the local variable 0 and the initialization tag 
depend on whether the class CNAM containing the method code is Object or 
not. If CNAM is not Object, then the initialization tag is notinitd and the static 
type for the local variable 0 is init{CNAM); The initialization tag notinitd means 
that an instance initialization method needs to be called exactly once within the 
current method code in any case, since, as we will see, rule (T-^J will change the 
initialization tag into Initd and rule (T^J checks whether the initialization tag 
is really Initd. Note that if CNAM is Object, the object being initialized cannot, 
and need not, be initialized by another <init> within the current <init>. 

Another point here is that the class CNAM is chosen to be the one containing 
the <init> method. In fact, rule (T-^J will assure that CNAM is either the 
original class of the object being initialized, or a superclass of it. Thus it is safe 
to use CNAM at the place of the original class 

The rule contains the component modo, which will be defined in Section^^J 

The cases for an instance method and a static method are straightforward. 
Not much explanation for these rules is necessary. 

The typing rules for method invocation instructions are given in Figure 
and^J Although these method invocation instructions are based on quite dif- 
ferent mechanisms, they all require that the operand stack at the program point 
of the instruction contain the correct number of arguments with certain types. 
In order to express this, each of the typing rules contains constraints of the 
following forms: 

(• • • , {TYn)void, • • •) = ajresolution_function{INDl , IND2) 

<P{P) = n[---,ST + -- - + TYSn, ■ ■ ■] 
invoConvert{ TYSi, TYi) {i = 1, . . .,n) 

We consider rule (T^J for invokespecial in Figure^Jin detail. The rule 
looks quite complicated, since the CC-part of the rule basically gives three cases. 
The program point type at the program point of a invokespecial instruction 
must satisfy one of these cases. 

The first case is when an <init> method is invoked on an object, on which 
no <init> method has been invoked before. In this case, the operand stack en- 
try containing the object to be initialized has the static type unin{P' , CNAM). 
Following the OJVMS, the rule requires that the class containing the <init> 
method must be CNAM, and that after the instruction, all occurrences of 
unin{P' , CNAM) are changed into CNAM, indicating that the object has been 
initialized. 



A Formal Specification of Java'''“ Virtual Machine Instructions 



293 



Mth{P) = invokespecial INDl IND2 

{MNAM, CNAM, {TYr,)NOTNULL_VOID, _) = mResolSp{INDl, IND2) 
<1>{P) = n[LT, ST + REFS. RAW +TYSn, IT, M] 
invoConvert{ TYSi,TYi) {i = 1, . . . , n) 

MNAM = < init > => 

( ( ( REFS. RAW = unin{P', CNAM) A 
LT' = LT[CNAM / REFS.RAW] A 
ST' = ST[CNAM / REFS.RAW] A 
IT' = IT A 
M' = Modi ) V 

( REFS. RAW = init(CNAM') A 
IT = notinitd A 

{CNAM' = CNAMW diSubcls{CNAM' , CNAM)) A 
LT' = LT[CNAM/ REFS. RAW] A 
ST' = ST[CNAM / REFS. RAW] A 
IT' = Initd A 
M' = Mod2 ) ) A 
NOTNULL.VOID = void ) 

MNAM 7 ^ < init > => 

( widRefConvert{REFS.RAW, CNAM) A 
LT' = LT A 

( NOTNULL.VOID = NOTNULL ^ ST' = ST + NOTNULL ) A 
( NOTNULL.VOID = void ^ ST' = ST ) A 
IT' = IT A 
M' = M ) 

<P{P + 3) □ n]LT', ST', IT', M'] 

Figure 14: The typing rule for invokespecial 



Note that the rule changes the component M into Modi in the above case. 
The definition of Modi will be given in Section^^3 

The second case is when the instruction invokes an <init> method on an 
object that is being initialized within the enclosing <init> method, i.e. when the 
initialization tag IT is notinitd and the operand stack entry for the object has 
the static type init{CNAM'). In this case init{CNAM') must be introduced by 
rule (T^J. As mentioned in the discussion for that rule, the enclosing method 
must be in the class CNAM' . The constraint 

{CNAM' = CNAM) V diSubcls{CNAM' , CNAM) 

means that the invoked <init> method is either in the same class as the enclosing 
method or in the immediate superclass of it. Analogous to the first case above, 
the instruction changes all occurrences of init{CNAM') into CNAM, indicating 
that after the instruction (but still inside the enclosing <init> method) the 
object being initialized is regarded as having been initialized. In addition, the 
constraint IT' = Initd in the rule expresses the change of the initialization tag 
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into Initd. Rule (T-^J for return will use the tag to determine whether an 
<init> method really invokes another <init> method or not. 

The constraint NOTNULL_VOID = void assures that the <init> method 
has no return type. 

Note that the rule changes the component M into Mod2 in the second case. 
The definition of Mod2 will be given in Section^^J 

The third case concerns the invocation of a usual instance method (e.g. via 
super). In this case, the constraint widRefConvert{REFS_RAW, CNAM) as- 
sures that the class CNAM is a superclass of all possible classes of the object, 
on which the method is invoked. In addition, the constraint implicitly implies 
that REFS _RAW = REFS holds. Now the method may have a return type or 
not. the operand stack type ST' after the instruction is either ST + NOTNULL 
or ST. 



Mth{P) = invokevirtual INDl IND2 

{MNAM, CNAM, {TY„)NOTNULL_VOID, _) = mResolV{INDl, IND2) 
-p{p) = n[ST -t refsTtY^] 

invoConvert{ TYSi, TYi) {i = 1, . . . ,n) 
widRefConvert{REFS, CNAM) 

MNAM A < init > 

NOTNULL.VOID = void ^ ST' = ST 
NOTNULL.VOID = NOTNULL => ST' = ST + NOTNULL 

<?(P + 3) □ n[ST'] 



Mth(P) — invokeinterf ace INDl IND2 BYTl BYT2 
BYTl > 0 
BYT2 = 0 

{MNAM ,INAM ,{TYbyti-i)NOTNULL_VOID) = mResolI{INDl,IND2) 
<P{P) = n[ST -t REFS+TYSbyti-i] 
invoConvert{TYSi, TYi) (i = 1, . . . , BYTl — 1) 
wtdRefConvert{REFS, INAM) 

MNAM 7 ^ < init > 

NOTNULL.VOID = void => ST' = ST 
NOTNULL.VOID = NOTNULL => ST' = ST + NOTNULL 
<5(P -t 5) □ n[ST'] 



Mth{P) — invokestatic INDl IND2 

{MNAM, {TYr,)NOTNULL_VOID, _) = mResolSt{INDl, IND2) 
T>{P) = nyST+TYSr,] 
invoConvert{TYSi, TYi) {i = 1, ■ ■ ■ ,n) 

MNAM 7 ^ < init > 

NOTNULL.VOID = void => ST' = ST 
NOTNULL.VOID = NOTNULL ^ ST' = ST + NOTNULL 
<5(P + 3) □ n[ST'] 



(T-15) 



(T-16) 



(T-17) 



Figure 15: The typing rules for other method invocation instructions 
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Rules (T-^3, (T-^J are for invokevirtual, invokeinterf ace 

and invokestatic. They are very similar to the third case of rule 
One difference is that they use the resolution functions mResolV , mResolI and 
mResolSt, respectively, instead of mResolSp. In addition, rule needs to 

deal with the number BYTl and BYT2 explicitly occurring in the 
invokeinterf ace instruction. The invokestatic does not need an object, on 
which the method is invoked. 



Prg(P) — areturn or ireturn 

S[P,J +[{L',S',P')],L,S+WRD] S[P' ,J,L' ,S' +WRD] 

Prg{P) — return 

S[P,J + [{L',S',P%L,S] S[P',J,L',S'] 

Figure 16: The state transitions for return instructions 



(S-17),(S-18) 

(S-19) 



The state transition rules for return instructions are given in Figure^] The 
state transition uses the return address P' stored in the current Java stack. 



Mth(P) — areturn 


PS] 

mInfo{P) 
S, REE) 


${P) = n\ST + REj 
(_, {TY^)REF, .) = 
widRefConvert{REE 




Mth(P) — ireturn 





<P{P) = n[ST + int] 

(-, {TY„)int, _) = mInfo{P) 



Mth{P) — return 

(_, MNAM, (TYn)void,_) = mInfo{P) 
MNAM = < init > ^ ^(R) = II[Initd] 



Figure 17: The typing rules for return instructions 



(T-18) 



(T-19) 



(T-20) 



The typing rules for return instructions are given in Figure^J The rules need 
no additional explanations. The only thing that is worth mentioning is that a 
return instruction may be used to terminate an <init> method. In this case, 
the rule checks whether the initialization tag is Initd to assure that the <init> 
method has indeed invoked another <init> method. Note that if the <init> 
method is in Object, then the tag has been set into Initd at the beginning of the 
method. 
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Note that in general, there may exist some uninitialized objects in the operand 
stack or local variables when a method terminates. However, there is no possi- 
bility to pass an uninitialized object to the invoking method (see Theorem^. 

10.6 Implementing Finally-Clauses 

According to the OJVMS, jsr and ret instructions are control transfer in- 
structions typically used to implement finally clauses in Java. Following the 
OJVMS, we call the program point, to which a jsr instruction jumps, a jsr 
target, and the code starting from a jsr target a subroutine. If no ambiguity is 
possible, we also call a jsr target a subroutine. Roughly speaking, a jsr instruc- 
tion calls a subroutine and a ret instruction returns from a subroutine. But, 
formally a subroutine need not have a ret instruction. We use sb to range over 
all jsr targets (i.e. subroutines) and write SB as a variable for them. 



Prg{P) = jsr BYTl BYT2 
P = (_, OFF) 

S[P, S] S[P + offset{BYTl, BYT2), S + OFF] 

Prg(P) = ret IND 
P = (COD,_) 

S[P,L] S[{COD,L{IND) + 3),L] 



Figure 18: The state transitions for jsr and ret 



(S-20) 



(S-21) 



The state transitions for jsr and ret are given in Figure Rule (S-^J 
says that a jsr instruction pushes the byte offset OFF of the current pro- 
gram point onto the operand stack and transfers control to the jsr target 
P+ offset{BYTl,BYT2). 

Rule (S-^J is for ret. It uses a byte offset in a local variable to compute the 
program point following the jsr as the returning program point. 

Typing jsr and ret is complex, since the OJVMS requires the following 
features: 

— Not every path in a subroutine needs to reach a ret instruction. A subroutine 
implicitly terminates whenever the current method terminates. 

— Subroutines may be nested: a subroutine can call another subroutine. (This 
feature is useful in implementing nested finally clauses.) 

— In nested subroutines, an inner subroutine may contain a ret instruction 
that directly returns to an arbitrary outer subroutine. 

— During the execution, a returning program point can never be used more 
than once by a ret instruction. Furthermore, at the outer program point, to 
which a ret instruction in an inner subroutine directly returns, no returning 
program point for a subroutine between the inner and the outer subroutine 
should still be able to be used as a returning program point. 
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Technically, the mechanism should be more complex, since the OJVMS still 
takes three additional situations into account. First, the implementation of a 
finally clause often needs to be reachable from different execution paths. Sec- 
ond, different execution paths often have to use a common local variable to 
hold their own contents that are incompatible to each other. Third, the content 
stored in a local variable in an execution path before the execution of a finally 
clause may need to be used after the execution of the finally clause. As an 
example for all these three situations, we can consider the implementation of 
a try-catch-f inally clause. More concretely, the finally clause needs to be 
reachable from the end of the try clause and from the beginning of the catch 
clause, the try clause needs to store a return integer value in a local variable 
for use after the execution of the finally clause, but the catch clause stores an 
exception in the same local variable for use after the execution of the finally 
clause as well. 

The problem is that since a common local variable may hold incompatible 
contents, as described in the second situation above, the usual typing rules in our 
formal specification would force the local variable in and, in particular, after the 
finally clause to have the type unusable. Therefore a use of the local variable 
in an individual execution path after the finally clause, as described in the 
third situation, would be impossible. 

To solve the problem, the OJVMS suggests to change the usual typing process 
such that in an execution path, if a local variable is not modified or accessed 
in a finally clause, then its type after the execution of the finally clause 
should be the same as before the execution of the finally clause. Thus we need 
a mechanism to record the local variables that are modified or accessed within a 
finally clause. The component mod in a program point type has been reserved 
for this purpose. Now we formally define what a component mod is: 



— First, we build a set grf of pairs of j sr targets, representing a directed acyclic 
graph. 

— Then we build a set csb of jsr targets. 

— Finally, a component mod is a mapping such that T>om{mod) = grf U csb, 
mod{sb, sb') for {sb,sb') G grf and mod(sb) for sb G csb are sets of indices 
of local variables. 

Intuitively, a pair (sb, sb') in a grf should denote a call of the subroutine sb' 
inside the subroutine sb, and grf should contain nested non-recursive subroutine 
calls that may reach the current program point. A set grf need not be a tree, 
since more than one subroutine may contain a call of the same subroutine and 
one subroutine may contain calls of more than one subroutine. A set csb should 
contain current subroutines, i.e. those subroutines that contain the current pro- 
gram point. The set mod{sb, sb') for {sb, sb') G grf should contain the indices of 
all local variables that may be modified or accessed in an execution path from 
sb to sb' , and mod{sb) for sb G csb those from sb to the current address. 
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We define the following notations: 

nod{mod) := {s5 | (s5, _) or (_, sb) or sb G T>om{mod)} 
grf{mod) := {{sb,sb') \ (sb, sb') G T>om(mod)} 
csb(mod) := {s6 | sb G 'Dom(mod)} 

We define that mod 3 mod' holds if and only if grf(mod) D grf(mod') 
and csb(mod) D csb(mod') hold, mod(sb, sb') D mod' (sb, sb') holds for each 
(sb, sb') G grf(mod ) and mod(sb) D mod (sb) holds for each sb G csb(mod ). 



Mth(P) = jsr BYTl BYT2 
$(P) = n\ST,M] 

SB ^P Y offset(BYTl, BYT2) 

SB ^ nod(M) 

'P(SB) g n[ST + sbr(SB), 

M\arf(M) u {(sb, SB) ^ M(sb) \ sb G csb(M)} U [SB ^ 0}] 



Mth(P) = ret IND 
<P(P) = n[LT] 

LT(IND) = sbr(_) 

'iP'yiND'yn''iLT'.( (Mth(P') = retIND' A P'^P A <P(P') = n'[LT']) 
^ LT(IND)^LT'(IND') ) 



Mth(P) = ret IND 
<P(P) = (LT,ST,IT,M) 

Mth(P') = jsr BYTl BYT2 
SB = P' + offset(BYTl, BYT2) 

LT(IND) = sbr(SB) 

<P(P') = n'[LT',ST',M'] 

T>(P' + 3) □ (LT'[j ^ invld(LT(j), subrs(SB, M)) \ j G mlvs(SB, M)], 
invld(ST, subrs(SB, M)), IT, 
reachMod(M, {s6 | (sb, SB) G Dom(M)}) U 
[sb ^ M'(sb, SB) U mlvs(SB, M) \ (sb, SB) G Vom(M)}) 

Figure 19: The typing rules for ret and jsr 



The typing rules for jsr and ret are given in Figure ^3 We first consider 
rule (T-^J. The rule defines only one constraint at the program point P of a 
jsr instruction, namely SB ^ nod(M), which assures that the called subroutine 
SB is not called recursively. At the beginning of the subroutine SB, the new M 
records the addition of the edges (sb, SB) representing the calls of SB inside all 
old current subroutines in csb(M), and elimination of the old current subroutines 
in csb(M) and addition of the new current subroutine SB, where SB 0 denotes 
that no local variables have been modified or accessed since the beginning of the 
new current subroutine SB. 
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Rule (T-^J is for ret. The constraint LT(IND) = sbr{_) assures that the 
local variable IND holds a byte offset. The constraint \f P'\f IND'\f II'\/ LT' . ■ ■ ■ 
assures that the method Mth has at most one ret instruction for the same 
subroutine. This is not a serious restriction, since whenever two ret instructions 
are needed, one can always write the first ret and at the place of the second ret 
a goto instruction to the first ret. 

Rule (T-^3 introduces constraints for the program type at the returning 
program point P' + 3, to which a ret at P returns, where the calling jsr of the 
subroutine is at P'. 

The formulation of rule (T-^J uses several new auxiliary functions. 

The first auxiliary function computes the set of the indices of all local vari- 
ables that may be modified or accessed in an execution path from a given pro- 
gram point to the current program point. For the component mod in a program 
point type and a subroutine sb G nod (mod), we define 



r l}(sb,sb>)egrf(mod)i^‘^d{sb, sb') U mlvs{sb', mod)) 

mlvs{sb, mod) := < if s6 ^ csb{mod) 

mod{sb) if sb G csb(mod) 

The term mlvs{SB, M) in rule (T^J is a set containing the indices of all local 
variable that may be modified or accessed from SB to P. 

The second auxiliary function computes all subroutines called from the call 
of a given outer subroutine to the current subroutine. For the component mod 
in a program point type and a subroutine sb G nod (mod), we define 



subrs(sb, mo 




\J{sb,sb')egrf{mod){sb}'~^subrs{sb',rnod) 

{sb} 



if sb ^ csb(mod) 
if sb G csb{mod) 



In order to change all subroutine types of those subroutines in a given set 
of subroutines E into invalid subroutine types, we define the following function 
invld(any, E): 



vld{any 



,E) :=| 



invldsbr 

any 



if any = sbr(sb) and sb G E 
otherwise 



Note that any in the second line can be an arbitrary static type. 

For convenience, we lift the function invld to operand stack types: 



invld{[anyQ, • • • , anym\, E) := [invld{anyQ, E), • • • , invld{anym, E)] 



In order to compute the part of a mod, which is reachable to a subroutine in 
a given set of subroutines E, we define the following function reachModimod, E): 

reachMod(mod , E) := 

{ {(s5, sb') 1 -^ mod(sh, sb') \ (sb, sP) G Vom(mod), sb' G E} 

U reachMod(mod , {sb \ (sb, sb') G T>om(mod), sb' G E}) if if yf 0 
0 if f; = 0 
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In rule (T^J, the applicability conditions ^(P) = (LT,---), Mth{P') = 
jsr BYTl BYT2, SB = P' + ojfset{BYTl, BYT2) and LT{IND) = sbr(SB) 
assure that the ret at P causes the subroutine SB to return to the next program 
point P' + 3 of the calling jsr at P'. Note that the constraint VP'VPVPVP' 
WLT' • • • in rule (T-^J enforces that there exists at most one P for a j sr at P' 
in rule 

Rule (T-^J expresses the following relationship between the program types 
at P, P' and P' + 3, where the jsr at P' calls a subroutine SB, the ret at P 
returns from the subroutine SB to P' + S: 

— If a local variable is definitely not modified or accessed from SB to P, then 
its static type at P' + 3 covers that at P'; otherwise, i.e. if the local variable 
may be modified or accessed from SB to P, then its static type at P' + 3 
covers that at P, except that if its static type at P is a subroutine type for 
a subroutine possibly called from SB to P, then its static type at P' + 3 
covers invldsbr. 

— The operand stack type at P' + 3 covers that at P, except that if an operand 
stack entry at P has a subroutine type for a subroutine called from SB to 
P, then the static type of the entry at P' + 3 covers invldsbr. 

— The initialization tag at P' + 3 covers that at P. 

— The subroutines called until P' + 3 include all those called until P but not 
from SB to P. The local variables possibly modified or accessed from the 
call of a current subroutine to P' + 3 include those from the call of the same 
current subroutine to SB plus all those possibly modified or accessed from 
SB to P. 

A final tricky point is that although the ret instruction in rule (T-^J accesses 
all those local variables that have a subroutine type for a subroutine called from 
P' to P, the typing rule need not treat this explicitly. The reason is that the 
indices of these variables are all contained in the set mlvs{SB, M) in rule (T-^J. 
In fact, if a local variable holds a program point of a jsr instruction between 
SB and P, then the program point must be stored in the local variable by an 
astore instruction between SB and P. By the typing rule for astore (see the 
discussion below) and the definition of mlvs, the index of the local variable must 
be included in the set mlvs {SB, M). 

10.7 On the Instructions that Modify or Access Local Variables 

Now it is time to give the precise definitions of the term Mod' in Figure J the 
term modo in Figure^Jand the terms Modi and Mod^ in^^ 

We first consider the typing rules in Figure^ Given the notations in Figure^ 
we formally define 

Mod' := M\grf(M) U {sb i— > M{sb) U {IND} \ sb G csb(M)} 

The typing rule in Figure^Jintroduces the program point type for the start- 
ing program point of a method. We define modo := 0. 
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Now we consider rule for invokespecial in Figure^J Since the first 

two cases in rule (T-^J consider the initialization of a raw object, we regard all 
those local variables whose contents reference the raw object as being modified. 
Given the notations in Figure^J we formally define 



Modi '■= M\grf(M) U 

{sb 1 -^ M{sb)U{i\ unin{P' , CNAM) = LT{i)} \ sb G csb{M)} 

Mod 2 '■= M\grf(M) U {sb 1 -^ M(s6)U{z| init{CNAM) = LT{i)} \ sb G cs6(M)} 

Note that the third case in rule (T-^J does not deal with initialization of a raw 
object, thus does not cause the extension of the M. 

11 Examples 

In this section we use real methods to illustrate how to check whether a given 
method has a given program type. 

For notational simplicity, some instructions are abbreviated as follows: 

— Each instruction 

opcode bytl byt2 

at the program point pp with opcode G {if_acmeq, if_icmeq, goto, jsr} and 
pp = (_, off) is abbreviated as 



opcode n 

with n= off + {bytl * (2®)) + byt2. 

— Each instruction 

opcode indl ind2 

with opcode G {getfield,putf ield,new, invokespecial, invokevirtual, 

invokeinterf ace, invokestatic} is abbreviated as 

opcode ffn 

with n = {indl * (2®)) + ind2. The instruction 

invokeinterf acemdl ind2 byt 0 

is abbreviated as invokeinterf ace ffn with n = {indl * (2®)) + ind2. 

Figure ^Jgives the type checking for the first method. A row in Figure^] 
contains a program point, i.e. an instruction, in the given method, the program 
point type in the given program type at the program point, the typing rule 
applied at the program point and all possible successor program points with 
respect to the rule. Since the method does not deal with any subroutines or 
instance initializations, we consider only the local variable table type and the 
operand stack type in a program point type. 
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We assume that the declaration of the method void m(Jl, J2) in Figure^] 
is contained in a class C. Furthermore, we assume that J1 and J2 are two 
interfaces, and the entry at the index #17 in the constant pool references a 
method in a superinterface of J1 and J2, which takes no parameters and yields 
no result. At the program point 13, the static type of the top entry in the 
operand stack needs to be represented as a set, since the top entry may be the 
first or second actual parameter and the interfaces J1 and J2 need not have one 
smallest common superinterface. Rule (T-^J is applied at the program point 
13, where the constraint widRefConvert{REF S, INAM) in the rule assures that 
the invoked method must exist in a superinterface of J1 and J2. 



The method 


LT 


ST 


Rule 


Succ« 


Method void m(Jl,J2) 






(T^S 


0 


0 aconst_null 


[C, Jl, J2] 


D 


(tF 


1 


1 aload 1 


[C, Jl, J2] 


[null] 


(tH 


3 


3 if_acmpeq 11 


[C, Jl, J2] 


[null, Jl] 


(tB 


6, 11 


6 aload 1 


[C7, Jl, J2] 


D 


(T^r 


8 


8 goto 13 


[C, Jl, J2] 


[Jl] 




13 


11 aload 2 


[C, Jl, J2] 


D 


(TB 


13 


13 invokeinterf ace #17 


[C, Jl, J2] 


[{J1,J2}] 


(TB 


18 


18 return 


[C, Jl, J2] 


D 


(tB 





Figure 20: A method containing an interface method invocation 



The second example in Figure shows the use of subroutine types. The 
method contains two jsr instructions at 0 and 9 calling the subroutine 13. The 
subroutine 13 contains a jsr at 15 calling the (inner) subroutine 18, and the 
subroutine 18 directly returns to the corresponding calling jsr of the (outer) 
subroutine 13, i.e. to 3 or 12. After the return, i.e. at 3 and 12, the subroutine 
types s&r(13) and s&r(18) are changed into invldsbr. The local variable 1 has 
different static types at the two calling jsr, i.e. at 0 and 9. Since the local 
variable 1 is not modified or accessed in the subroutine 13, after the return of 
the subroutine, i.e. at 3 and 12, the static type of the local variable 1 is the same 
as that at the calling jsr, i.e. at 0 and 9. 



12 Static Well-Typedness vs. Runtime Properties 

The OJVMS requires that the type-correctness of nearly all runtime uses of data 
is checked statically. In our formal specification, which considers a subset of the 
JVM, we can formally prove that if a program is statically well-typed, then all 
runtime data to be used will definitely have correct types. For doing this, we 
first need to define precisely what are the types of runtime data. 
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The method 


LT 


ST 


Rule 


Su 


Method void m() 






(tD 


1 0 


0 jsr 13 


[C, unusable, unusable] 


D 


(tH 


1 13 


3 astore 2 


[C, unusable, invldsbr] 


]invldsbr] 


(Tff 


5 


5 aload 0 


[C, unusable, invldsbr] 


D 


(Tfl 


7 


7 astore 1 


]C, unusable, invldsbr] 


[q 


(TH 


9 


9 jsr 13 


]C, C, invldsbr] 


D 


(Tfl 


1 13 


12 return 


]C, C, invldsbr] 


]invldsbr] 


(TB 




13 astore 2 


]C, unusable, unusable] 


[s6r(13)] 


(t[ 


15 


15 jsr 18 


]C, unusable, s6r(13)] 


D 


(tQ 


1 18 


18 ret 2 


]C, unusable, s6r(13)] 


[s6r(18)] 


(tB 










(tB 


1 3 








(tB 


1 6 



Figure 21: A method containing subroutines 



12.1 Tags of Runtime Data 

In the previous sections we have often informally mentioned the types of runtime 
data. Two examples are as follows: 

— In Section ^^3 we mentioned that a new instruction creates an object. This 
informally implies that the created datum is a reference of an object. 

— In Section we mentioned that a jsr instruction pushes a byte offset 
onto the operand stack. 

However, the problem is that both an object reference and a byte offset are 
one-word wide data in our constraint domain. In other words, the type of a 
datum cannot be determined by the datum itself. Thus we need an additional 
mechanism to explicitly determine the type of a datum. 

The mechanism can be built in two steps: first, we define the possible types 
of runtime data; second, we extend the state transition relation to define the 
types of the contents in the local variables and the operand stack. 

A relatively simple set of possible types of runtime data, called tags, is defined 
as follows: 

tag ::= cnam \ null \ int \ addr \ undefined 

Intuitively, the tag cnam should be the tag of the reference of each object 
of the class cnam, null that of the special reference null, and int that of each 
element of the primitive type int. As mentioned before, we need to deal with 
the byte offset of a jsr. Thus we introduce the tag addr for all byte offsets. The 
tag undefined indicates that the content of a local variable or an operand stack 
entry has not been explicitly defined by an instruction in the execution so far. 
We use tag to range over all tags. 

Note that the above set of tags is a relatively simple one, since they do not 
contain anything to express that an object is a raw object or an offset is of a 
special subroutine type. In fact, there are no problems to do that, except that 
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the definition of the types of the contents in the local variables and the operand 
stack would become more complicated. We consider the above simple set due to 
space limits in this chapter. 

To record the type of the content of each local variable and each operand 
entry, lists of tags of the form [tag^, • • • , tag^] with n > — 1 are introduced. We 
define [tag^, ■ ■ ■ , tag„]{k) := tag^. if 0 < fc < n, [tagQ, ■ ■ ■ , tag„]{k) := failure 
otherwise. A list of the above form is called a local variable state tag if it consists 
of the types of the contents in all local variables; it is called an operand stack 
state tag if it consists of the types of the contents of an operand stack. 

For readability, we use Ivstag to range over all local variable state tags, and 
stktag over all operand stack state tags. For notational simplicity, we write LG 
as a variable for the sort Ivstag, and SG for the sort stktag. 

A local variable state tag and an operand stack state tag do not record the 
type of an object that is held by a field of another object but not directly by 
a local variable or an operand stack entry. Thus we still need to introduce a 
class record as a mapping {objn > cnam„}. A class record as above maps all 
elements other than obj^ to a special value failure. We use classof to range over 
all concrete class records and C as a variable for the sort classof. 

In order to record the local variable state tags and the operand stack state 
tags for the methods stored in a Java stack, we define a Java stack tag as a list 
consisting of entries of the form (Ivstag, stktag). We use jstktag to range over all 
Java stack tags and use JG as a variable of the sort jstktag. 

We define a program state tag as a tuple (jstktag, classof, Ivstag, stktag) and 
in the rest of the chapter still use statag to range over all program state tags. 

Finally, we define that an extended program state is a pair (stat, statag), where 
stat = (pp, jstk, Ivs, stk, hp), statag = (jstktag, classof , Ivstag, stktag), size(jstk) 
= size(jstktag), size(lvs) = size(lvstag) and size(stk) = size(stktag) hold. 

Now we extend the state transition rules in Section^] Let us call an extended 
state transition rule an extended rule and an original state transition rule an 
extended rule in this section. 

In order to ensure that the extended rule relation does not affect the original 
state transition relation, we require that if an original rule in Section^Jis of the 
form 



Premises 
Stat => Stat' 

then the extended rule obtained from it is always of the form 

Premises 

Stat, Statag => Stat' , Statag' 

satisfying that 

— TV(Statag') C TV(Statag) 

— for every two program states stat and stat' and every extended program state 
(stat, statag), if there is a substitution a such that T>om(a) — PV (Premises) 
cupPV(Stat Stat'), stat = a(Stat), stat' = a(Stat') and a(Premises) 
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hold, then there is a substitution a' such that 'Dom{a') = 'Dom{a)U 
J-V(Statag) and a'(Statag) = statag hold, and {stat' ,a' {Statag')) is an ex- 
tended program state. 

For notational simplicity, we always omit the Premises-, Stat- and Stat'-pa,Tts 
in the definition of an extended rule in this section. Note that the Statag- and 
Statag' -parts may contain variables occurring in Stat- and Stat'-parts. 

In many extended rules, the Java stack tags are not changed and the local 
variable state tags (or the operand stack state tags) are changed in a completely 
analogous way as the local variable states (or the operand stack states, respec- 
tively). The extended rules for aload and new are two such extended rules. We 
give their definitions in Figure and omit the explicit presentation of other 
such extended rules due to space constraints. 



(JG,C,LG,SG) {JG,G,LG,SG + LG(IND)] 

{JG, G, LG, SG) (JG, G[OBJ ^ GNAM],LG, SG -t GNAM] 
Figure 22: The extended rules for aload and new 



(S’fl 

(S’fl 



Figure^Jcontains the extended rule for getf ield. The rule is slightly tricky, 
since the way to get the tag of the loaded content depends on whether the loaded 
content is an object or not. If it is an object, then the tag should be obtained 
from the class record classof in the program state. If it is a value of a primitive 
type, then the tag should the primitive type. (In this chapter the only primitive 
type is the type int.) To model this, we define the following auxiliary function, 
which yields the tag of the content held by the field fnam of the type notnull in 
the object obj. 

seltag{fnam, notnull, obj , hp, classof) := 

{ classof {hp{obj){fnam)) if notnull is a cnam 

int if notnull is int 

failure otherwise 



(JG, G,LG,SG) 

=> ( JG, G, LG, SG + seltag{FNAM, NOTNULL, OBJ, H, G)) 
Figure 23: The extended rule for getf ield 



(S’fl 



Rules (S^3> and (S^B ^r method invocations change the Java stack 

states. Thus their extended rules change the Java stack tags. Since these exten- 
sions are very similar, we present only one of them. The situation is similar 
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for rules (S^J^and Thus we present only one of the two extended 

rules. Figure^3 these two extended rules, where undefined'^ stands for a list 
[undefined, • • • , undefined] consisting of k times undefined. 



(JG, C,LG,SG+ TAGo + TAGr,) 

{JG + (LG, SG), G, undefi.ned^^^[i ^ TAGi | 0 < i < n], []] 



{JG + {LG', SG'), G, LG, SG + TAG 

{JG + {LG',SG'),G,LG',SG' + TAG] 



(S’D 

(S’D 



Figure 24: The extended rules for invokespecial and areturn 



{JG,G,LG,SG) =4> {JG,G,LG,SG+ addr) 

{JG,G,LG,SG) =4> {JG,G,LG,SG) 

Figure 25: The extended rules for jsr and ret 



(S’B 

(S’B 



Figure^Jcontains the extended rules for jsr and ret. Note that in rule (S’- 
^9, the program state tag does not change at all. The intuition is that a ret 
may change some the validity of some byte offsets. However, since we consider 
only a simple tag addr for byte offsets, this intuition cannot be reflected. (As 
mentioned, the simple tag addr could be replaced by a family of tags indexed 
by all subroutines. But we do not consider them here.) 

12.2 The Concepts for Runtime Type Safety 

To model the correctness of a tag tag with respect to a static type any, we 
formally define a relation eorrect by: 

correct{null, refs) 

correct{cnam,refs) if widRefConvert{cnam, refs) 

correct{cnam, unin{_, cnam)) 

correct ( cnam ,init{ cnam ' ) ) if widRefConuert ( cnam , cnam' ) 

correct{int, int) 
correct{addr , sbr{_)) 
correct{undefined , unusable) 

correct{tag , any) if correct{tag , any') and any 3 ariy' 

We also define that correct{lvstag , lusty) holds if and only if size{lvstag) = 
size{lvsty) and correct {lvstag{i), lvsty{i)) for alH = 0, . . . , size{lvstag) , and that 
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correct{stktag , stkty) holds if and only if size(stktag) = size(stkty) and 
correct{stktag{i) , stkty{i)) for all z = 0, . . . , size{stktag) . 

For a heap hp and a class record classof, we define that correct{hp , classof) 
holds if and only if the following conditions are true: 

1. 'Dom(hp) C 'Dom(classof). 

2. For each obj i— > rec G hp, if (fnam, notnull) G allFields{classof{obj)), then 
fnam G Vom(rec). 

3. For each obj rec G hp and {fnam, notnull) G allFields{classof{obj)), if 
notnull = ref, then rec{fnam) G T>om{classof). 

4. For each obj rec G hp and {fnam, notnull) G allFields{classof{obj)), if 
notnull = ref, then widRefConvert{classof{rec{fnam)), notnull). 

Intuitively, conditionjsays that classof can determine the class of each object 
in hp. ConditionHassures that an object in hp always contains all fields required 
by its class. Condition H assures that if an object in hp contains a field whose 
type is a class or an interface, then the field holds an object, whose class can be 
determined by classof. ConditionHsays that the class of the object held by the 
field in conditionHis a subtype of the class or interface of the field. 

Note that if notnull yf ref, i.e. if notnull = int, then conditions^andjhave 
no effects. Thus one might wonder why we do not define a condition constraining 
rec{fnam). The intuition is that if the runtime type of a datum is a primitive 
type, then the runtime type is always the static type. Thus for {fnam, int) G 
allFields{classof{obj)) and obj ^ rec G hp, the content rec{fnam)) is always an 
integer of the type int. Hence such a condition is useless. 



12.3 Runtime Properties 

From now on, we assume that the program Prg has a program type prgty. 
Formally we define an arbitrary execution of Prg as 

{stati, statagi) {stat 2 , statag 2 ) => ••• 

where each {stati, statagi) for i = 1, 2, 3, • • •, are extended program states, stati 
is of the form {ppi, []j ’ ’ ’) Prg{ppi) is of the form invokestatic • • •. 

We use {stati, statagi) =>* {statu, statagu) with ft, > 1 to denote a zero or 
more step execution 

{stati, statag^) ••• {statu, statag^f) 

For the rest of the chapter, we assume that 

— stati = {ppi,jstki, Ivsi, stki, hpi) for all i = 1, 2, 3, . . ., 

— statagi = {jstktagi, classof i, Ivstagi, stktagi) for all i = 1,2,3,..., and 

— prgty{ppi) = pttypp. = {lusty pp., stktypp., intag pp.,modppf) for all 

i = 1, 2, 3, . . .. Note that z yf j does not imply that ppi y^ ppj. 
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Now we give some lemmas and theorems. Proofs are omitted due to space 
limits. 

The first theorem states the runtime type safety. 

Theorem 1. In the execution {stati, statagi) (stat 2 , statag 2 ) 

correct{lvstagi, lusty pp_^), correct{stktagi, stktypp^) and correct(hpi) hold, then 
correct{lvstag^, lusty pp ), correct{stktag^, stktypp.) and correct{hp^) hold for all 

The proof follows from an induction on the length of the execution using the 
extended rules and typing rules. 

A practical consequence of Theoremjis as follows: 

Corollary 1. An offset cannot be manipulated by an instruction described in 
our formal specification except: 

1. It can be created and stored onto the operand stack by a jsr. 

2. It can be manipulated in the operand stack by the stack manipulation in- 
struction dup. 

3. It can be stored from the operand stack in a local uariable by an astore. 

J^. In a local uariable, it can be used to compute the return address by a ret. 

Now let us consider raw objects and instance initialization methods. The fol- 
lowing theorems can either be proved using a set of tags for runtime data that 
is more refined then the current one, or by a careful analysis of all possible exe- 
cutions. Note that Theorems H and Jare not completely trivial, since a method 
may pass values via the heap. 

Theorem 2. Assume that a method inuokes another method. Then the inuoked 
method can neuer pass a raw object back to the inuoking method. 



Theorem 3. Assume that a method inuokes another method that is not an 
<init>. Then the inuoking method can neuer pass a raw object to the inuoked 
method. 

It is very easy to show how an instance initialization method invokes another 
instance initialization method. 

Theorem 4. If an instance initialization method is not in class Object, then a 
fragment of an execution path from the starting address to a return instruction 
of the method always includes exactly one inuocation of an instance initialization 
method of the same class or the immediate superclass on the object being initial- 
ized. If the instance initialization method is in class Object, then the fragment 
includes no inuocations of an instance initialization method on the object being 
initialized. 

Now we can state when the static type of a local variable or an operand stack 
entry ensures that it contains a raw object. 
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Theorems. Assume that {stati, statagi) => {stat2, statag2) • is an 

execution and X G {Ivsh, stkh\ and XT G {lusty stktyj^} with h>l are such 
that X is Ivsfi if and only if XT is lusty (and thus X is stkh if and only if XT 
is stktyf^). 

— If XT{k) = unin{pp, cnam) holds for some k, pp and cnam, then X{k) 
contains a reference to an uninitialized object of the class cnam created by a 
new at pp. 

— If XT{k) = init{cnam) holds for some k and cnam, then X{k) contains a 
reference to an object of cnam that is being initialized inside an <init> and 
has not been initialized by another <init>. 

The following lemma shows that it is impossible for two different local vari- 
ables/operand stack entries at a program point to have the same static type 
unin{pp, cnam) for some pp and cnam but hold references to different unini- 
tialized objects. In fact, the lemma states the correctness of the typing rule 
for invokespecial on an instance initialization method, i.e., that if an object 
in a local variable/operand stack entry with the static type uninfpp, cnam) is 
initialized, then all occurrences of unin{pc, cnam) can be replaced by cnam. 

Lemma 1. Assume that {stati, statagf) {stat2, statag2) is an 

execution and G {lush, stkh\ and XT ,yT G {lusty j,, stkty with h > 1 
such that X is lush if and only if XT is lusty i,, and y is lush if and only ifyT 
is lusty i,. Then the following conditions cannot hold at the same time for the 
indices k and k' : 

— XT{k) = yT{k') = unin{pp, cnam) holds for certain pp and cnam. 

— X{k) and y{k') contain references to different uninitialized objects created 
by the same new at pp. 

Now we know that if a memory location has a class as a static type, then it 
always holds initialized object or null. 

Theorem 6. Assume that {stati, statagi) {stat2, statag2) • • • is an 

execution and X G {lusu, stkh\ and XT G {lusty f,, stktyf,} with h>l such that 
X is lusfi if and only if XT is lusty f,. If XT{k) = cnam holds for some k and 
cnam, then X{k) contains a reference to an initialized object of cnam or null. 

The typing rules for an instruction specify precisely how the instruction be- 
haves on an uninitialized object. The following theorem summarizes some of the 
results: 

Theorem 7. 1 . A reference to an uninitialized object cannot be used in an 

instruction described in our formal specification except it is dup aload, 
astore or invokespecial. In the case o/ invokespecial, the method must 
be <init>, the object is the one being initialized and must be of the same 
class as the <init>. 
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2. Inside a method <init> that is not declared in the class Object, there must be 
a call to another <init> on the object being initialized via an invokespecial, 
where the called <init> is declared in the same class as or in an immediate 
superclass of that of the calling <init>. Before this call, the object being ini- 
tialized cannot be used in an instruction described in our formal specification 
except it is dup, aload or astore. 

13 Conclusion 

We have shown a formal specification of a substantial subset of JVM instruc- 
tions. The formal specification clarifies some ambiguities and incompleteness 
and removes some (in our view) unnecessary restrictions in the description of 
the official Java Virtual Machine Specification 

Finally, it is worth mentioning that our study on the semantics of the JVM 
in this chapter led to the discovery of a possibility of writing a constructor that 
invoked no other constructor in the JDK 1.1.4 implementation of the JVM, 
which is clearly an implementation bug with respect to the official Java Virtual 
Machine Specification (page 122). 
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Abstract. A formal specification of a Java Secure Processor is pre- 
sented, which is mechanically checked for type consistency, well formed- 
ness and operational conservativity. The specification is executable and 
it is used to animate and study the behaviour of sample Java programs. 

The purpose of the semantics is to document the behaviour of the com- 
plete JSP for the benefit of implementors. 

1 Introduction 

A smart card is a complete ‘embedded’ computer housed in a piece of plastic 
the same size as a credit card ^3- The computer has to be small to reduce the 
risk of mechanical problems. Because of these mechanical constraints, as well as 
aspects of cost, the current generation of smart cards typically contains only a 
small 8-bit micro processor, a few hundred bytes of RAM, a few Kbytes of ROM 
and a few Kbytes of EEPROM. This small size constrains the freedom in the 
design of the software that has to be run on a smart card processor. 

Java B was originally designed for writing embedded software. Because of 
this pedigree it is attractive as a smart card programming language. Some facil- 
ities provided by the Java language are too expensive to be implemented on a 
smart card. Threads, and dynamic class file loading fall in this category. Further 
study is needed to find ways of incorporating the Java exception mechanism and 
a garbage collector on smart cards. Smart cards do not use floating point arith- 
metic so this feature of Java is not needed. Using the subset of Java as described 
above for smart cards is attractive. It is also feasible to implement this Java 
subset on computers with limited resources. 

The standard Java class libraries are not suitable for smart cards because 
many of the facilities provided are meaningless on smart cards. Examples in- 
clude the interface to GUI libraries. Instead a smart card would host a specially 
designed set of class libraries dedicated to the application domain of card appli- 
cations. The set of class libraries would be small enough to fit in the card and 
would be versatile enough to provide standard smart card facilities, such as the 
ISO 7816-4 command set Q, or down loadable applications for multi-application 
smart cards Q. 
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A Java Secure Processor (JSP) is a virtual machine that is designed to fit on a 
smart card. A JSP does not implement the full Java Virtual Machine (JVM) Q. 
Instead a JSP is accompanied by a JVM to JSP translator, which compiles 
standard JVM byte codes into byte codes for the JSP. Java Soft has written 
a sophisticated translator, which performs extensive program analysis to allow 
a large class of Java programs to be run on the JSP. To support our work on 
the formal definition of the operational semantics of the JSP we have written a 
simple translator, which accepts a smaller class of Java programs. The simple 
translator is used to validate the operational semantics. 

A standard Java development environment can be used to write Java pro- 
grams for smart cards. Instead of relying on the standard class libraries the 
programmer uses the smart card class libraries. A simulator can be used to test 
the code. The process of loading Java programs into a card is quite different 
from loading and running programs on a workstation, as it may involve man- 
ufacturing ROM masks. We will not discuss this aspect further, the interested 
reader is referred to the literature 

A smart card is a secure token that may control commodities of real value. 
Secure here means that the card should be hardware and software tamper re- 
sistant, and that it should not leak information. The considerations that apply 
to the security of Java in general 0 also apply to Java for smart cards. In ad- 
dition, Java for smart cards should provide facilities such as ownership control 
and cryptographically protected modes of use. 

The resource limitation of a smart card makes it more difficult to ensure that 
security is maintained. For example currently a complete byte code verifier is 
too large to be implemented on a smart card. The JSP approach assumes that 
JVM byte codes are verified when translated into JSP byte codes. The results 
are then digitally signed so that tampering can be detected when code is being 
loaded. 

A clear, concise and complete specification of the semantics is a prerequisite 
for a successful and secure implementation of a JSP. The present document 
provides such a specification. The document is based on an informal description 
of the JSP from Java Soft, who are currently building a tool suite for a JSP Q. 
The formal specification is self contained but does not document the motivation 
for many of the design decisions made for the JSP. The interested reader is 
referred to the informal specification. 

The present formal specification is a latos Q literate script. Latos is a 
tool for developing operational semantics. Latos supports publication quality 
rendering using FT);;]X, execution and animation using a functional programming 
language, and derivation tree browsing using Netscape. Latos helps to check 
that a specification is operationally conservative. The latos meta language is 
basically Mirand augmented with a notation for rules of inference and sets. 
Developing a semantics as a literate script avoids clerical errors and confusion, 
as syntax and type errors are detected by the tool. 



1 



Miranda is a trademark of Research Software Ltd. 



The Operational Semantics of a Java Secure Processor 315 



The formal specification does not support the capabilities of JSP development 
environment. Instead the latos tool provides a tracing facility allowing for a 
detailed study and analysis of executing application programs. 

Related work on the semantics of the JVM includes the executable speci- 
fication of the ‘defensive’ JVM made by Computational Logic Inc work by 
Bertelsen on another subset of the JVM and also other chapters of this book. 

The next section describes the restrictions imposed on the kind of Java pro- 
grams supported by a JSP on a smart card. Section J presents the execution 
model of a JSP and Section ^defines the instructions of the virtual machine. 
The relationship between the JVM and the JSP is explored in Section^ A brief 
example of how the semantics of the JSP may be used to validate the behaviour 
of a sample Java program is given in Section ^ The last section presents our 
conclusions and suggestions for future work. 



2 Java Language Restrictions 

The JSP design imposes a number of restrictions to allow a Java program to be 
run in the constrained runtime environment of a smart card. The most important 
restrictions are: 

— The JSP provides no support for threads, multi-dimensional arrays, floating 
point numbers, and Just-in-time byte code translation. 

— Exceptions may be raised by application programs, but they can only be 
handled by the system. 

— There is no garbage collection. Objects can be allocated dynamically but the 
majority are expected to be allocated statically using compile time garbage 
collection techniques. The formal specification allows objects to be allocated 
any time. It would be possible to state and prove a property about programs 
that are guaranteed not to allocate objects after a certain point in their 
execution. This constitutes a desirable safety property of those programs. 

— Class files cannot be loaded dynamically. Instead the software to be present 
in the card is loaded when the card is manufactured or personalised. 

— Recursive methods are discouraged and recursive class constructors and ex- 
ceptions are disallowed. 

— Integers and shorts are identified. The JVM to JSP byte code translator 
should ensure that the results obtained from a computation on the JSP are 
identical to the results that would have been obtained on a JVM. 

— The number of arguments, local variables, methods, and object instances are 
limited. 

Java programmers have to be aware of these restrictions when writing code 
that is intended for a JSP. Some of the restrictions can be circumvented by 
the use of appropriate class libraries. Others will be taken care of by program 
analysis techniques in the JVM to JSP translator. 
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3 Execution Model 

The JSP is a byte oriented stack machine. It also has a read-only memory area 
for storing methods and constants, an area of memory and some registers to 
maintain the book-keeping of the machine, and a heap. 

The data manipulated directly by Java programs is faithfully modelled by the 
semantics. In particular the operand stack, the fields of objects and the elements 
of arrays contain bytes only. A short or a reference is always treated as a pair 
of bytes. The structures that support the machine itself, such as the byte codes, 
stack frames and heap objects are modelled as higher level entities rather than 
as collections of bytes. The ensuing specification is of a low level, which makes 
it eminently suitable to serve as a guideline for implementors of a JSP. 

The formal specification defines all structured data (not scalars) of the virtual 
machine either as (partial) mappings or as algebraic data types (i.e., a sum of 
product types). Each of these is of a different type, that is incompatible with any 
other useful type. The latos system performs strong type checking to ensure 
that all the type constraints in the operational semantics are indeed satisfied. 

3.1 Basic Data 

The basic data in the formal specification are derived from the natural numbers. 
Similarly, the raw data in the JSP implementation are derived from a sequence 
of bytes. The type bit (below) permits any numeric value, but sensible values are 
in the range 0 ... 1. (The equivalence symbol is used to bind a name to a type, 
the equals symbol binds a name to a value). In a JSP implementation, a boolean 
is stored in a byte, which permits sensible values as well as non-sensible values. 
We would have preferred to identify bit and bitrange but unfortunately the type 
system used by latos (i.e., the Hindley-Milner type system of Miranda) is not 
strong enough to support sub types, 
bit = num; 

bitrange — 0 ■ ■ ■ I5 

Other raw data and ranges defined in a similar way include the signed 8-bit 
byte, the signed 16-bit short and the unsigned 16-bit reference. The nullreference is 
a special reference value, which is represented as zero. Regular references should 
not have this particular value. 

3.2 Store Areas 

The JSP virtual machine uses a number of areas of store for data, code and 
book-keeping. Each of these areas is represented in the formal specification as a 
mapping of numerical indices onto values of the appropriate type, thus providing 
a uniform, albeit low level approach to information handling in the JSP. 

— A JSP uses a stack of activation frames, where each frame contains an 
operand stack and some book-keeping. The activation frames are gathered 
in the machine- wide frameArea. The frame area is represented as a partial 
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mapping from the domain framePointer to the range frame. The representa- 
tion as a partial function makes it possible to represent common operations 
on structures in a clear and succinct way. The type frame itself is defined in 
Section^3 

framePointer = num; 

framePointerrange = 0 . . . 255; 

frameArea = framePointer ^ frame; 

— Heap objects are instances of classes or arrays. The objects are gathered in 

the machine-wide heapArea. The type object is defined in Section^3 
heapPointer = num; 

heapPointer^g^gg = 0 . . . 65535; 
heapArea = heapPointer ^ object; 

— Static program data are represented by bytes. This data is gathered in the 
machine- wide staticArea. 

staticPointer = num; 
staticPointerrange = 0 . . . 65535; 

StaticArea = staticPointer ^ byte; 

— The machine-wide codeArea gathers the byte codes and the method headers 
for the methods of all application programs in the system. The type byteCode 
is defined in Section | 

programCounter = num; 

programCounter^gagg = 0 . . . 65535; 

codeArea = programCounter — ^ byteCode; 

— The application program table progTable records the class table of each 
loaded application program. 

progid = num; 

Progld.ggge = 0 ... 63; 
progTable = progid ^ classTable; 

— There is one instance of class Class for each class in the system. The class 

table gathers such instances. The type classObject is defined in Section^H 
classid = num; 

classldrange = 0 . . . 127; 
classTable = classid ^ classObject; 

— Each class in the system is accompanied by a method table, which maps 
a method id onto the program counter value at which the method header 
is located. The methodTable is defined as an algebraic data type with two 
components and with constructor MethodTable. 

methodid = num; 

methodldrange = 0 . . . 255; 

entryTable = methodid ^ programCounter; 

methodTable = MethodTable classid entryTable; 



3.3 Stack Frames 

The operand stack within the topmost frame plays a special role in that it 
can be accessed by the JSP instructions. To acknowledge this special role, the 
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formal specification shadows the operand stack, and manipulates it as a separate 
component of the virtual machine configuration. 

A method invocation creates a stack frame (shown below as an instance of 
the data type frame). The frame has the following four components: 

— the programCounter representing the return address to the caller of the 
method. 

— the framePointer to the previous frame. This information is redundant in 
the specification, as frames are numbered sequentially starting from 0. In an 
implementation frames would be referred to by their address, in which case 
the frame pointer is needed. 

— the stackPointer within the operand stack; local and temporary variables of 
the current method. 

In the specifications that follow, a stack is always accompanied by a stack 
pointer (which points at the last used element). All stack operations can be 
modelled by a combination of adding (or subtracting) a constant to (from) the 
stack pointer and/or updating the mapping. For example, pushing an element 
onto the stack means incrementing the pointer and updating the mapping with 
a new association. 

StackPointer = num; 
stackPointerrange = 0 . . . 255; 
operandStack = stackPointer ^ byte; 
frame = Frame programCounter framePointer 

StackPointer operandStack; 

A JSP uses a slightly different stack frame configuration than the JVM, a 
difference that is taken into account by the JVM to JSP byte code translator. 

3.4 Headers 

Objects and methods have headers, which record book-keeping information. This 
section describes all possible headers in the system. 

— An objectHeader records the identity of the application program progid, the 
size of the object in bytes instanceSize and a table listing all the methods for 
the object. The classid for the object is available from the methodTable. 
instanceSize = num; 

instanceSizerange = 0 . . . 127; 

objectHeader = ObjectHeader progid instanceSize methodTable; 

— An array may contain scalars (bits, bytes or shorts) or references to objects. 
An array has a header, which records the application program id, the class 
id of the element type, the method table for the element type, an indication 
of the element type and the length of the array. 

dataType = bit | byte | short | ref; 

arrayLength = num; 

arrayLength,.3„gg = 1 . . . 4096; 

arrayHeader = ArrayHeader progid classid methodTable 

dataType arrayLength; 
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— A method has a header, which records two flags and three sizes. The flags 
record whether the method is native and whether it is public. The stack 
size is currently unused, but the paramsSize and localsSize are used to create 
appropriate frames. Stack frames are limited in size due to the limitations 
on available RAM space in smart cards. 

isNative = bool; 

isPublic = bool; 

stackSize = num; 

StackSizerange = 0 . . . 15; 
paramsSize = num; 
paramsSize,3„gg = 0 ... 15; 
localsSize = num; 
loCalsSizerange = 0 . . . 15; 

methodHeader = Method Header isNative isPublic 
stackSize paramsSize localsSize; 

3.5 Objects 

The JSP works with three different kinds of objects: 

— A regular object has a header and a number of fields represented by the 
fieldTable. The fields are represented as bytes and the methods are available 
from the header. 

fieldid = num; 

fieldidrange = 0 . . . 255; 

fieldTable = fieldid ^ byte; 

regularObject = RegularObJect objectHeader fieldTable; 

— An array object records an array header as well as the array elements. The 
elements are represented as bytes. 

arrayindex = num; 

arraylndex,.3„gg = 0 . . . 4095; 

arrayTable = arrayindex ^ byte; 

arrayObject = ArrayObJect arrayHeader arrayTable; 

— There is one classObject for every object in the system. The classObject 
itself is an instance of class Class. The classObject records the normal object 
header as well as the size of an instance of the class, the method table for the 
class, the depth in the class hierarchy, the classid of the super classes and the 
interface classes implemented by the class. The instance size and the method 
table are redundant as the object header also contains this information. 
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classDepth 

classDepth,3„gg 

superld 

superld,3„g^ 

superTable 

interfaceld 

interfaceldrange 



= num; 

= 0 . . . 255; 
= num; 

= 0 . . . 255; 
= superld — 
= num; 

= 0 . . . 255; 
implementTable = methodid 
interfaceTable = interfaceld 
classObject 



classid; 



methodid; 
implementTable; 

= ClassObject objectHeader instanceSize methodTable 
classDepth superTable interfaceTable; 



The JSP heap is used to store regular and array objects only. A classObject 
is allocated statically in a area separate from the heap. The union type object 
therefore does not cover class objects, 
object = regularObject | arrayObject; 

The two auxiliary predicates below are used to determine whether we are 
dealing with a regular object or an array object. 
isRegularObject regularObject = True; 
isRegularObject arrayObject = False; 
isArrayObject arrayObject =True; 
isArrayObject regularObject = False; 

4 Instruction Set 



There are 25 different categories of JSP byteCode (below), all with their own 
type. The methodHeader is treated as a pseudo instruction. This models the 
practice of preceding the code for each method by its header. 
byteCode = methodHeader | 

constinst | loadinst | storelnst | incinst | stackinst | 
newarrayinst | arrayLoadInst | arrayStorelnst | 
arithinst | logicallnst | convertinst | comparelnst | 
controllnst | switchinst | exceptioninst | 
invoke! nterfacelnst | invokevirtuallnst | 
invokelnst | returninst | 
objectinst | instancelnst | 

getfieldinst | putfieldinst | getstaticinst | putstaticinst | 
breakpointinst; 

The following categories of byte codes have been defined: 



— Load, store and increment instructions. 
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constinst = nop | bpush byte | spush byte byte | apush byte byte | 

aconstnuii I bconstmi | 

bconsto I bconsti | bconst2 | bconsts | bconst4 | bconst5; 

loadinst = bload stackPointer | bloado | bloadi | bload2 | bload3 | 
sload StackPointer | sloado | sloadi | sload2 | sloads | 
aload StackPointer | aloado | aloadi | aload2 | aloads; 
storelnst = bstore stackPointer | bstoreo | bstorei | bstore2 | bstores | 

sstore StackPointer | sstoreo | sstorei | sstore2 | sstores | 

astore stackPointer | astoreo | astorei | astore2 | astores; 

incinst = bine stackPointer byte | sine stackPointer byte; 

— Stack instructions. 

stackinst = pop | pop 2 | dup | dup 2 | dup_x byte | swap | swap 2 ; 

— Array creation, load and store instructions. 

newarrayinst = newarray dataType | anewarray classid; 
arrayLoadInst = arraylength | baload | saload | aaload; 
arrayStorelnst = bastore | sastore | aastore; 

— Instructions for arithmetical, logical, conversion and comparison operations, 

arithinst = bneg | sneg | badd | sadd | bsub | ssub | 

bmul I smul | bdiv | sdiv | brem | srem; 

logicallnst = bshi | bshr | bushr | sshi | sshr | sushr | 

band | sand | bor | sor | bxor | sxor; 

convertinst = s 2 b | b 2 s; 

comparelnst = bemp | semp | aemp; 

— Instructions for the transfer of control, 
offset = (byte, byte); 

control Inst = ifeq offset | ifit offset I ifgt offset I 

ifne offset | ifge offset | ifle offset | goto offset; 

— Instructions to support switch statements, 

tableswitchindex = num; 

tableswitchindexrange =0...I27; 
tableswitchTable = tableswitchindex offset; 

lookupswitchindex = num; 

lookupswitchlndex,. 3 „gg = 0 . . . 126; 

lookupswitchTable = lookupswitchindex (byte, offset); 

switchinst = tableswitch offset byte byte tableswitchTable | 

lookupswitch offset byte lookupswitchTable; 

— Instructions to support exceptions, 
exceptioninst = athrow | jsr offset | ret stackPointer; 

— Instructions for method invokation. 

invokeinterfacelnst = invokeinterface paramsSize interfaceld methodid; 
invokevirtuallnst = invokevirtual paramsSize methodid; 

invokelnst = invoke offset; 

returninst = breturn | sreturn | areturn | return; 

— Instructions for object creation and manipulation. 
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objectinst = new classid; 

instancelnst = instanceof classid | checkcast classid | 

ainstanceof dataType | acheckcast dataType | 
aainstanceof classid | aacheckcast classid; 
getfieldinst = bgetfield stackPointer | sgetfield stackPointer; 
putfieldinst = bputfield stackPointer | sputfield stackPointer; 
getstaticinst = bgetstatic byte byte | sgetstatic byte byte; 
putstaticinst = sputstatic byte byte | bputstatic byte byte; 

— Miscellaneous instructions, 
breakpointinst = breakpoint; 




classTable = 

classObject = 

I ClassObject 



objectHeader = 

ObjectHeader 

progid 

instanceSize 

methodTable = 

MethodTable 

classid 

entryTable = 

methodid programCounter 
instanceSize 
methodTable = 

MethodTable 

classid 

entryTable = 

I methodid programCounter ~| 
classDepth 
superTable = 

I superld classid | 
interfaceTable = 

, , , implementTable = 

inter ace 7 L 1 methodid 7 L methodid 



Fig. 1. Read only structures. 
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staticArea = 

I staticPointer -/-* byte 



heap Area = 

I heapPointer 7A object 



frameArea = 



framePointer 



frame = 




Frame 

programCounter 
framePointer 
stackPointer 
operandStack = 




StackPointer yA byte 



object = 

arrayObJect = 

ArrayObJect 
arrayHeader = 

ArrayHeader 
progid 
classid 

methodTable = 

MethodTable 

classid 

entryTable = 

methodid 7A programCounter 
dataType 
arrayLength 
arrayTable = 
arrayindex 7A byte 



regularObJect = 

RegularObJect 

obJectHeader = 

ObjectHeader 

progid 

instanceSize 
methodTable = 

MethodTable 

classid 

entryTable = 

methodid 7A programCounter 
fieldTable = 
field Id 7A byte 



Fig. 2. Structures that can be written to. 



We have now completed the definition of the JSP machine structures. To 
assist the reader retrieving a particular definition, Figures H and H summarise 
the read only structures and the structures that are written to during execution 
of a JSP program respectively. For each of the three different kinds of structures 
that we have used, the name is given (followed by an = symbol) and a suggestive 
graphical representation. The partial maps are shown in a single box, with the 
domain to the left of the ^ symbol and the range to the right. A product data 
type is shown as a sequence of vertically stacked boxes, one for each component. 
A sum data type is shown as a horizontally arranged sequence of boxes. 

The following sections present the semantic rules for a representative selection 
of the JSP byte codes. Since there are many groups of similar byte codes, we 
consider it justified to give the rule for just one member of each group without 
sacrificing the rigour of the specification. 



4.1 Pushing Constants onto the Stack 

The stack is controlled by the stack pointer, which points at the last used loca- 
tion. A short occupies two consecutive locations in the stack, with the high byte 
at the lowest stack pointer index (bigendian). 



324 



Pieter H. Hartel, Michael J. Butler, and Moshe Levy 



Table 1. Labelled equality relations. The type given is that of the two operands. 



ha 


heap Area 


it 


implementTable 


ob 

=> 


object 


oh 


obJectHeader 


os 


operandStack 




(byte, byte) 


T)S 


[(byte, byte)] 



ah 


arrayHeader 


at 


arrayTable 


b 


byte 


ct 


classTable 


la 


frameArea 


4 


fieldTable 



S 

=> 


short 


hp^ 


heapPointer 


VC 


programCounter 


sa 

=> 


staticArea 




frame 


be 


byteCode 



The relation => below describes the effects of each of the instructions deal- 
ing with constants on the stack. The type of the relation shows that in addition 
to the instruction itself, only the stack pointer and the operand stack are rel- 
evant here. The left operand of the relation specifies the machine components 
that are accessed, the right operand mentions those that may be changed by 
the instruction. Specifying the types of the relations thus provides an aid in the 
documentation of the system. The types of all relations of the JSP transition 
system are summarised in Tabled We will not give the explicit types of the 
remaining relations. 

IhSconst = (constinst, stackPointer, operandStack); 

rhSconst = (stackPointer, operandStack); 

.. (InSconst^ ^l^nSconst): 

The rules for nop, bpush and spush below reveal most aspects of the notation 
that we are using. The semantics of an instruction is defined by an axiom or a rule 
of inference. The text in square brackets to the left of the axiom/rule is a label 
to identify the rule. A rule has a number of premises (above the horizontal line) 
and a conclusion. An axiom has a conclusion but no premises. Rules and axioms 
may have side conditions. The two axioms and the rule below together define 
the relation over components of the JSP virtual machine configurations. 

[nop] b(nop, sp, os) (sp, os); 

[bpush] h(bpush V, sp, os) (sp J- 1, os 0 {sp J- 1 1 -^ v}), 
if(sp+l)G stackPointerrange; 

hos 0 {sp 0 1 1 -^ hi} 0 {sp 0 2 1 -^ loj ^ os' 

[spush] h(spush hi lo, sp, os) (sp 0 2, os'), 
if (sp 0 1 . . . sp 0 2) C StackPointerrange! 

The configuration on the left hand side of the arrow consists of an instruction 
and its operands (eg. spush hi lo), the current stack pointer (sp), and the operand 
stack (os). Other components of the JSP machine, such as the heap are not used 
by the three rules above. 

The configuration on the right hand side consists of the next value of the stack 
pointer (eg. sp 0 2) and the new operand stack (os'). Some of the components 
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mentioned on the left hand side are not present on the right hand side, because 
they are not changed by the instruction. We have been careful in exposing only 
the information required, so as to improve the clarity and succinctness of the 
specification. 

The premise of the spush rule asserts a relationship between components of 
the old and the new configuration. The relation is an equality relation, which 
holds when the operands are both of type opera ndStack. Labelling equalities with 
the type of the operands helps the mechanical type checker spot clerical errors. 
Many other labelled equalities are used throughout. The labels and the types of 
the operands are summarised in Table J The actual definition of the relations 
is omitted. 

The notation os 0 {sp + 1 v} extends the mapping os with a new do- 
main/range pair. Any previous association for the new domain value sp + 1 is 
lost. It follows that it is sufficient to decrement the stack pointer to ‘forget’ 
mappings for particular values in the domain. Furthermore, we do not in general 
have the invariant domain(os) = 0 . . . sp. 

The side condition for the bpush and spush operations determines when it 
is safe to extend the stack. If it is not safe, then the relation does not hold. 

The rule for the apush operation is not shown here because it is identical 
to that of the spush operation: an address is a numeric value and therefore 
indistinguishable from a short. In a typed version of the JSP the instructions 
would not be the same. 

4.2 Pushing Immediate Constants 

Some constants are needed so often that special instructions have been defined to 
push them onto the stack. The semantics of the specialised instructions such as 
hconsto (below) is defined in terms of the general operation bpush. The rules for 
the remaining instructions aconstnuii, bconstmi, bconsti . . . bconst5 (not shown) 
are defined in a similar way. 

h(bpush 0, sp, os) (sp', os') 

[bconsto] h(bconsto, sp, os) (sp', os'); 



4.3 Loading Local Variables onto the Stack 

The load instructions transfer values from the parameter and local variable area 
of the stack frame to the top of the operand stack. Local variables and parameters 
are accessed via a fixed index from the bottom of the operand stack. The reader 
is reminded that the operand stack is just a portion of the current frame, but 
we view the operand stack separately from the frame for convenience. 

The side conditions on the rules below check for stack overflow. There is no 
explicit check on the value of the index i because it is assumed that the static 
semantics of the byte codes, as enforced by the byte code verifier, will deal with 
illegal offsets. 
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Table 2. A summary of the types of all relations defining the transition system 
of the Java secure processor. 







programCounter 


1 codeArea 1 


byteCode | 


stackPointer 


operandStack 


framePointer 


frameArea 


heapPointer 


heapArea 


progid 1 


progTable | 


staticArea 


outputStream 


const 


con St In St 








rw 


rw 


















load 


loadinst 








rw 


rw 


















store 


store In St 








rw 


rw 


















inc 


incinst 








r 


rw 


















stack 


stackinst 








rw 


rw 


















newa^rray 


newarrayinst 








r 


rw 






rw 


rw 


r 


r 






arra^oad 


arrayloadinst 








rw 


r 








r 










arra^tor. 


arraystorelnst 








rw 


r 








rw 










arith 


arithinst 








rw 


rw 


















logical 


logicallnst 








rw 


rw 


















conv 


convinst 








rw 


rw 


















com'^are 


comparelnst 








rw 


rw 


















control 


controllnst 


rw 






rw 


rw 


















switch 


switchinst 


rw 






rw 


rw 


















exception 


exceptioninst 


rw 






rw 


rw 


















invokeinter f ace 


invokeinterfacelnst 


rw 


r 




rw 


rw 


rw 


rw 




r 


r 


r 






invokevirtual 


invokevirtuallnst 


rw 


r 




rw 


rw 


rw 


rw 




r 










invoke 


invokelnst 


rw 


r 




rw 


rw 


rw 


rw 














return 
































returninst 


w 






rw 


rw 


rw 


r 














object 


objectinst 








rw 


rw 






rw 


rw 


r 


r 






instance 


instancelnst 








rw 


rw 






r 


r 


r 


r 






getf^eld 


getfieldinst 








rw 


r 








r 












putfieldinst 








rw 


r 








rw 










getstatic 


getstaticinst 








rw 


r 














r 




putstatic 


putstaticinst 








rw 


r 














rw 




breakpoint 


breakpointinst 








rw 


r 
















rw 


exec 


exec In St 


rw 


r 


r 


rw 


rw 


rw 


rw 


rw 


rw 


r 


r 


rw 


rw 
























The Operational Semantics of a Java Secure Processor 327 



hos(i) V 

[bload] h(bload i, sp, os) (sp + 1, os © {sp + 1 1 — > v}), 
if (sp + 1) G stackPointerrange; 

h(os(i), os(i + 1)) ^ (hi, lo), 

hos © {sp + 1 1 -^ hi} © {sp + 2 1 -^ lo} os' 

[sload] h(sload i, sp, os) (sp + 2, os'), 

if (sp + 1 . . . sp + 2) C StackPointerrange! 

The rule for aload and those for the specialised versions bloado . . . bloads, 
sloado . . . sloads and aloado . . . aloads are not shown here. 



4.4 Storing Stack Values into Local Variables 

The store instructions transfer values from the operand stack into parameter 
and local variable area of the stack frame. This time the side conditions check 
for stack underflow. 

hos(sp) 4> V 

[bstore] h(bstore i, sp, os) (sp — 1, os©{ii— >v}), 
if sp G StackPointerrange! 

h(os(sp— 1), os(sp)) ^ (hi, lo), 
hos © {i hi} © {i + 1 1 — > lo} ^ os' 

[sstore] h(sstore i, sp, os) (sp — 2, os'), 
if (sp — 1 . . . sp) C StackPointerrange! 

The astore instruction is identical to the sstore instruction. The specialised 
instructions bstoreo . . . bstores, sstoreo . . . sstores and astoreo . . . astores are not 

shown here. 



Table 3. Explicit conversions between arbitrary integers and shorts (n2s), ar- 
bitrary integers and bytes (n2b), between shorts and pairs of bytes (s2p, p2s), 
between booleans and bytes (b2b) and a range comparison operator ==. 



n2s :: num-^short; 

n2s(n) = n mod 32768; 

s2p :: short^(byte, byte); 

s2p(s) = (s div 256, s mod 256); 

b2b :: bool— >byte; 

b2b True = 1; 

b2b False = 0; 



n2b :: num— >byte; 

n2b(n) = n mod 128; 

p2s :: (byte, byte)'^short; 

p2s(hi, lo) = 256*hi + lo; 

== :: num— >num— »num; 

X == y =1, if X > y; 

= 0, ifx = y; 

= —1, otherwise; 
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4.5 Increment Instructions 

The increment instructions load the value of a local, increment the value with a 
signed, 8-bit constant, and store the result. There is no scope for stack underflow 
or stack overflow, but it is possible for the data to under or overflow. This 
particular error condition is ignored by the JSP. The specification models this 
behaviour by using a conversion function n2s, which maps out of bounds values 
into the range of a short. The functions of Table J define explicit conversions 
between arbitrary integers and shorts (n2s), arbitrary integers and bytes (n2b), 
between shorts and pairs of bytes (s2p, p2s), and between booleans and bytes 
(b2b). These conversions are used consistently throughout the document, so 
that is would be easy to change the byte order of shorts. This approach makes 
it easier to implement the JSP on platforms with different views on number 
representations. 

hn2b(os(i) -b c) 4> V 

[bine] h(binc i c, sp, os) (os 0 {i v}); 

hs2p(n2s(p2s(os(i), os(i + 1)) + c)) (hi, lo), 
hos 0 {i 1 -^ hi} 0 {i 0 1 1 — > lo} os' 

[sine] h(sinc i e, sp, os) (os'); 



4.6 Stack Instructions 

The stack manipulation instructions are intended to rearrange information on 
the operand stack. The side conditions check for stack underflow and/or overflow. 

— The pop and pop2 instructions remove one and two bytes respectively from 
the stack. There are no separate pop instructions for shorts and references, 
to save opcodes. 

[pop^] h(pop, sp, os) (sp - 1, os); 

[pop2] h(pop2, sp, os) **= 0 '' (sp-2, os); 

— The dup and dup2 instructions duplicate one and two bytes respectively on 
top of the stack. 

hos(sp) V 

[dup] h(dup, sp, os) **= 0 ^ (sp 0 1, OS0 {sp 0 1 1 -^ v}), 
if (sp . . . sp 0 1) C stackPointerrange; 

h(os(sp - 1), os(sp)) (V 2 , Vi), 

hos 0 {sp 0 1 1 -^ V 2 } 0 {sp 0 2 1 -^ vi} ^ os' 

[dup2] h(dup2, sp, os) **= 0 ^ (sp 0 2, os'), 

if (sp — 1 . . . sp 0 2) C StackPointerrange; 
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— The dup_x instruction duplicates the top k elements of the operand stack n 
elements down the stack. The symbol W is the function overriding operator 
and the notation {xi \ i <— [a.. 6]} generates a set of Xi where i ranges from 
a to b. 

hkn mod 16 n, 
hkn div 16 k, 
hsp' + k sp', 

hos 1+) {sp' — i + 1 1 -^ os(sp — i + 1) I i<— [n..l]} ^ os', 
hos' l±l {sp' — n — i + 1 1 — > os(sp' — i + 1) I i^[k..l]} ^ os" 

[dupx] h(dup_x kn, sp, os) (sp + k, os"), 

if (sp — n . . . sp + k) C stackPointerrangeA 
k e (1 . . . 4)An e (0 . . . 8)Ak < n; 

— The swap and swap2 instructions swap the top two bytes and the top two 
pairs of bytes respectively on top of the operand stack. 

h(os(sp - 1), os(sp)) (V 2 , Vi), 
hos 0 {sp — 1 1 -^ vi} 0 {sp V 2 } ^ os' 

[swap] h(swap, sp, os) (sp, os'), 

if (sp — 1 . . . sp) C stackPointerrange; 

h(os(sp — 3), os(sp — 2)) ^ (hi 2 , I 02 ), 
h(os(sp-l), os(sp)) ^ (hii, loi), 
hos 0 {sp — 3 hii} 0 {sp — 2 I— > loi} ^ os', 
hos' 0 {sp — 1 1 -^ hi 2 } 0 {sp 1 -^ I 02 } ^ os" 

[swap2] h(swap2, sp, os) (sp, os"), 

if (sp — 3 . . . sp) C StackPointerrange; 



4.7 Creating Array Objects 

Arrays are stored in the heap. Therefore, the transition relation specifies 

read/write access to the heap, as well as the operand stack. In addition, object 
creating instructions need to know which is the current application program 
id (pi). This information is used to classify objects according to who created 
them. The type of the relation reflects the fact that the program id is used but 
not changed. (The reader is reminded that Table Hsummarises the types of all 
transition relations.) 

The array operation newarray expects the length of the array on the top of 
the operand stack. It accesses the length as al. newarray creates an appropriate 
array header ah and a mapping with a domain of 0 ... al — 1 to serve as the initial 
value of the array. The method table used is that of class Java. lang. Object. The 
heap is extended with a new object which is to receive the created array header 
and contents. The reference to the new object is pushed onto the stack. The side 
condition ensures that stack underflow, heap overflow, or an invalid array length 
is detected. 
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hp2s(os(sp — 1), os(sp)) 4> al, 

LArrayHeader pi 0 java.Iang.Objectmt byte al ^ ah, 
h{i 0 I i«— [0..al — 1]} ^ at, 
hhp + 1 ^ hp'. 



hs2p(hp') (hi, lo), 

hos 0 {sp — 1 1 — > hi} 0 {sp lo} ^ os'. 



hha 0 {hp' I— > ArrayObject ah at} ^ ha' 
[newarray^] h(newarray byte, sp, os, hp, ha, pi, pt) 

uew^ray ^ 

if (sp — 1 . . . sp) C stackPointerrangeA 
al G arrayLength,. 3 ^ Ahp' G heapPointer,,^ 



The two other versions of newarray are not shown here: the bit version of 
newarray is identical to the byte version above, because each bit is stored in a 
byte field. The short version uses two bytes for storing each short. 



The anewarray instruction allocates an array of references to objects of the 
class associated with the given class id (ci). The application program id (pi) is 
used to access the class table of the current application program. This class table 
provides the method table for the array elements. The array is initialised to null 
references. 



hp2s(os(sp — 1), os(sp)) al, 
hpt(pi) 4 ct, 

hct(ci) 4 ClassObject _ _ mt , 

LArrayHeader pi ci mt ref al 4 ah, 

hjii^O I i^[0..2»al - 1]} 4 at, 

hhp 0 1 4 hp', 

hs2p(hp') 4 (hi, lo), 

hos 0 {sp — 1 1 -> hi} 0 {sp lo} 4 os'. 



hha 0 {hp' 1 -^ ArrayObject ah at} 4 ha' 
[anewarray] h(anewarray ci, sp, os, hp, ha, pi, pt) 

uew^ray ^ 

if (sp — 1 . . . sp) C StackPointerrangeA 
al G arrayLengthranggAhp' G heapPointerr, 



4.8 Loading Values from Arrays 



The operation arraylength expects an array reference r on the stack and returns 
the length of the array. The side condition checks for stack underflow, and that 
a valid heap pointer to an array object is presented. 
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hp2s(os(sp— 1), os(sp)) r, 

hha(r) ^ ArrayObject(ArrayHeader al)_, 

hs2p(al) ^ (hi, lo), 

hos 0 {sp — 1 hi} 0 {sp lo} ^ os' 

[arraylength] h(arraylength, sp, os, ha) os'), 

if (sp — 1 . . . sp) C stackPointetrangeA 
r G heapPointer,.3„ggAisArray0bject(ha(r)); 

Array load instructions access an array and deliver a value at the given 
index position. The side conditions check for stack underflow, a null reference, 
an improper object and illegal values of the array index. 
hp2s(os(sp — 3), os(sp — 2)) r, 
hp2s(os(sp — 1), os(sp)) i, 
hha(r) ^ ArrayObject _ at, 
hat(i) ^ V 

[baload] h(baload, sp, os, ha) (sp — 3, os 0 {sp — 3 v}), 

if (sp — 3 . . . sp) C StackPointetrangeA 
r G heapPointer,.3„ggAisArray0bject(ha(r))Ai G domain(at); 

hp2s(os(sp — 3), os(sp — 2)) r, 

hp2s(os(sp— 1), os(sp)) i, 

hha(r) ^ ArrayObject _ at, 

h(at(i*2), at(i*2 0 1)) =§> (hi, lo), 

hos 0 {sp — 3 1 -^ hi} 0 {sp — 2 1 -^ lo} os' 

[saload] h(saload, sp, os, ha) (sp — 2, os'), 
if (sp — 3 ... sp) C StackPointetrangeA 
r G heapPointer,.3„ggAisArray0bJect(ha(r))A 
(i*2 . . . i*2 0 1) C domain(at); 

The operation aaload is identical to saload and not shown here. 



4.9 Storing Values into Arrays 

The array store instructions need read/ write access to the stack and read access 
to the heap. The side conditions check for stack underflow, null references, non- 
array objects, and illegal array indices. The aastore instruction is identical to 

sastore. 
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hp2s(os(sp — 4), os(sp — 3)) r, 

hp2s(os(sp — 2), os(sp — 1)) i, 

hos(sp) 4 > V, 

hha(r) ^ ArrayObject ah at, 

hat 0 {i 1 — > v} at', 

hha 0 ArrayObject ah at'} ^ ha' 

[bastore] h(bastore, sp, os, ha) (sp — 5, ha'), 
if (sp — 4 . . . sp) C stackPointerrangeA 

r e heapPointer,.3ageAisArray0bject(ha(r))Ai G domain(at); 

hp2s(os(sp — 5), os(sp — 4)) r, 

hp2s(os(sp — 3), os(sp — 2)) i, 

h(os(sp— 1), os(sp)) ^ (hi, lo), 
hha(r) ^ ArrayObject ah at, 
hat 0 {i*2 I— > hi} 0 {i*2 0 1 1 — > lo} at', 
hha 0 {n-^ ArrayObject ah at'} ^ ha' 

[sastore] h(sastore, sp, os, ha) (sp — 6, ha'), 
if (sp — 5 . . . sp) C StackPointerrangeA 
r G heapPointerrangeAisArrayObject(ha(r))A 
(i*2 . . . i*2 0 1) C domain(at); 



4.10 Arithmetic 

The unary (arithmetic) negation operator is defined below for bytes and shorts. 
It ignores under /overflow of values, but checks for stack underflow. 
hos(sp) V 

[bneg] h(bneg, sp, os) (sp, os 0 {sp n2b(-v)}), 
if sp G stackPointerrangei 

hs2p(n2s(— (p2s(os(sp — 1), os(sp))))) ^ (hi, lo), 
hos 0 {sp — 1 1 — > hi} 0 {sp lo} ^ os' 

[sneg] h(sneg, sp, os) (sp, os'), 

if (sp — 1 . . . sp) C stackPointerrangei 

Binary addition for bytes and shorts is defined below. The other binary arith- 
metic instructions (for subtraction, multiplication, division and remainder) are 
defined in the same way, and are not shown. The side condition of the division 
and remainder operations check that the divisor is non-zero. 

h(os(sp- 1), Os(sp)) j/ (V 2 , vi) 

[badd] h(badd, sp, os)“^^(sp— 1, os 0 {sp — 1 1 -> n 2 b(v 2 0 vi)}), 
if (sp — 1 . . . sp) C stackPointerrange; 
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hp 2 s(os(sp — 3), os(sp — 2 )) V2, 

hp2s(os(sp — 1), os(sp)) vi, 
hs2p(n2s(v2 + vi)) (hi, lo), 
hos 0 {sp — 3 I— > hi} 0 {sp — 2 I— > lo} ^ os' 
[sadd] h(sadd, sp, os) (sp — 2, os'), 

if (sp — 3 ... sp) C stackPointeti-ange; 



4.11 Logical Instructions 

The logical shift left as defined below shifts the element next to the top of the 
stack. The shift count is the top of the stack. The remaining binary logical 
instructions (for arithmetic shift right with sign extension, unsigned shift right, 
bit-wise and, bit-wise or and bit-wise exclusive or) are defined in the same way 
and are not shown. 

b(os(sp-l), os(sp)) (V2, vi) 

[bshi] h(bshl, sp, os) ^°?^°\sp — 1, os 0 {sp — 1 1-> n2b(v2 « vi)}), 
if (sp — 1 ... sp) C stackPointerrangeAvi G (0 ... 7); 

hp2s(os(sp — 3), os(sp — 2)) V2, 

hp2s(os(sp — 1), os(sp)) vi, 
hs2p(n2s(v2 « vi)) (hi, lo), 
hos 0 {sp — 3 1-^ hi} 0 {sp — 2 1-^ lo} os' 

[sshi] h(sshl, sp, os) ^°?5^°\sp — 2, os'), 

if (sp — 3 ... sp) C stackPointerrangeAvi G (0 . . . 15); 



4.12 Conversions 

The conversion operations explicitly truncate a short to a byte or zero fill a byte 
to a short. Stack underflow and overflow are detected. 
hn2b(p2s(os(sp — 1), os(sp))) ^ v 
[s2b] h(s2b, sp, os) (sp — 1, os 0 {sp — 1 1-^ v}), 
if (sp — 1 . . . sp) C stackPointerrange; 

hs2p(os(sp)) ^ (hi, lo), 

hos 0 {sp hi} 0 {sp 0 1 lo} ^ os' 

[b2s] h(b2s, sp, os) (sp0 1, os'), 

if (sp . . . sp 0 1 ) C StackPointerrange; 



4.13 Comparisons 

The compare instruction bcmp returns —1 if the top element of the stack is 
greater than the one below it. It returns 0 if the top two elements are equal and 
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1 otherwise. The scmp instruction compares the shorts on top of the stack. The 
definition of the range comparison operator == is given in Tabled 

L(os(sp-l), os(sp)) (V 2 , vi) 

[bcmp] h(bcmp, sp, os) (sp — 1, os © {sp — 1 1— > (v2 == vi)}), 
if (sp — 1 . . . sp) C StackPointetrange; 

hp2s(os(sp — 3), os(sp — 2)) V2, 

hp2s(os(sp — 1), os(sp)) vi, 
hos © {sp — 3 (V 2 == Vi)} ^ os' 

[scmp] h(scmp, sp, os) (sp — 3, os'), 

if (sp — 3 ... sp) C stackPointerrange! 

The acmp instruction compares object references and returns 0 if the refer- 
ences are equal, 1 otherwise. 

hp2s(os(sp — 3), os(sp — 2)) V 2 , 

hp2s(os(sp — 1), os(sp)) vi 

[acmp] h(acmp, sp, os) ^ (sp — 3, os © (sp — 3 (v 2 == vi)mod 2}), 

if (sp — 3 ... sp) C stackPointerrange! 



4.14 Transferring Control 

The ifeq instruction adds its immediate operand to the value of the program 
counter (pc) if the top of the stack contains 0. Otherwise the program counter is 
incremented to point at the next instruction. Stack underflow is detected. The 
static semantics is assumed to detect illegal values for the program counter. 
hos(sp) 4 > V, 
hpc + p2s offset ^ pc' 

[ifeq°] h(pc, ifeq offset, sp, os) sp — 1, os), 

if sp G stackPointerrangeAv = 0; 

hos(sp) 4 > V 

[ifeq^] b(pc, ifeq offset, sp, os) (pc+ 1, sp — 1, os), 

if sp S stackPointerrangeAvyfO; 

The remaining operations ifit, ifgt, ifne, ifge, ifle are similar and not shown. 
The static semantics is assumed to check that the unconditional jump in- 
struction goto carries a valid offset, 
hpc + p2s offset pc' 

[goto] h(pc, goto offset, sp, os) (pc'^ sp^ os); 



4.15 Support for Switch Statements 

The tableswitch and lookupswitch instructions provide support for the Java 
switch statements. The tableswitch instruction allows for a selection of jump 
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targets from an indexed table, with the choice index coming from the stack. The 
lookupswitch instruction is similar, except that a keyed table is used rather than 
an indexed one. 



Both instructions have a number of immediate operands, the first of which is 
the default offset. The tableswitch instruction has further immediate operands 
to specify the lower and upperbounds of a jump table and the jump table itself. 
The instruction expects a byte index on the stack, which is used to select the 
appropriate offset from the jump table. The offset is then added to the current 
value of the program counter. If the index lies outside the range defined by the 
lower and upperbound, the default offset is added to the program counter. 



The side condition checks that the stack pointer is valid, but does not need 
to check that the old or new values of the program counter are valid. This is the 
task of the static semantics. 



hos(sp) index, 
hcases(index) ^ offset, 
hpc + p2s(offset) pc' 

[tableswitch^] b(pc, tableswitch default low high cases, sp, os) 
^ (pc', sp - 1, os), 

if sp G stackPointerrangeAindex G (low . . . high); 



[tableswitch^] 



hos(sp) 4> index, 
hpc+ p2s(default) pc' 

h(pc, tableswitch default low high cases, sp, os) 



switch 

if sp 



(pc', sp - 1, os), 

G StackPointerrangeAindex 



^ (low . . . high); 



The lookupswitch has a default offset and further immediate operands to 
specify the number of entries in the jump table and the jump table itself. The 
lookupswitch instruction expects a key on the stack, which when it occurs in 
the table is used to select the appropriate offset from the jump table. The offset 
is then added to the current value of the program counter. If the key does not 
occur in the jump table, the default offset is added to the program counter. 



hos(sp) 4> key, 

h{o I (k, o)^range(cases) A key = k} ^ offsets, 
hpc+ p2s(hd (offsets)) pc' 

[lookupswitch^] b(pc, lookupswitch default entries cases, sp, os) 



switch 

if sp 



(pc', sp - 1, os), 

G stackPointerrangeAoffsetsyfj}; 
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hos(sp) 4> key, 

h{o I (k, o)^range(cases) A key = k} offsets, 
hpc+ p2s(default) pc' 

[lookupswitch^] h(pc, lookupswitch default entries cases, sp, os) 



switch 

if sp 



(pc', sp - 1, os), 

G stackPointerrangeAoffsets = 



{}; 



4.16 Exception Handling 

The athrow instruction terminates the execution of the JSP program, for there 
is no pc, sp and os for which the relation below holds. The present treatment of 
exceptions is somewhat crude, but consistent with ISO 7816-4 requirements. 

r I T , , , s exception , ■. 

[athrowj r(pc, athrow, sp, os) => (pc, sp, os), 
if False; 

The jsr and ret instructions are used by the JVM to support exception han- 
dling. Even though the JSP provides only rudimentary support for exceptions, 
the semantics of these two instructions is well defined. Stack overflow and illegal 
return addresses are detected. 

hp2s(os(i), os(i -F 1)) pc' 

r \ exception , , > 

[retj b(pc, ret i, sp, os) ^ (pc , sp, os), 
if pc' G programCounter,. 3 „gg; 

hpc -F p2s(hiv, lov) 4> pc', 
hs2p(pc) 4 (hip, lOp), 

hos 0 {sp 0 1 1 -> hip} 0 {sp 0 2 i-> lOp} ^ os' 

[jsr] h(pc, jsr(hiv, lOv), sp, os) (pc', sp0 2, os'), 

if (sp 0 1 . . . sp 0 2) C stackPointerrangeA 
pc' G programCounter,. 3 „gg; 



4.17 Method Invocation 

The JSP has three different instructions to invoke methods. The invokevirtual 
is the normal dynamic method dispatch instruction. The invoke instruction is 
used when the Java compiler or JSP to JVM byte code translator are able to 
determine statically which method to invoke. The invokeinterface instruction 
supports Java’s approach to multiple inheritance by searching for a method that 
implements an abstract method from an interface. 

The invokeinterface instruction has three operands. The first, params, spec- 
ifies the number of arguments to be expected on the operand stack. The second 
immediate operand, ii, indicates the index of an interface. The third mi deter- 
mines which (abstract) method within the interface is required. 
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hpt(pi) 4 ct, 

hp2s(os(sp — params + 1), os(sp — params + 2)) r, 



hha(r) 4 RegularObject oh 

hoh 4 ObjectHeader _ _(MethodTable ci et), 

hct(ci) 4 ClassObject cit, 



hcit(ii) 4 it, 
hit(mi) 4 mi', 
het(mi') 4 pc', 

hjparams — i i— > os(sp — i + 1) | i4—[l.. params]} 4 os', 

hfa 0 {fp 0 1 Frame(pc 0 l)fp(sp — params)os| 4 fa' 

[invoke^] h(pc, ca, invokeinterface params ii mi, sp, os, fp, fa, ha, pi, pt) 



invokeinterface 



pc', params— 1, os', fp0l, fa'). 



if (sp — params 0 1 . . . sp) C stackPointer, 



A 



range 

r G heapPointer,,3„ggAisRegular0bject(ha(r))A 

ci G claSSidrangeAii G 
mi G methodIdrangeAmi' G 
fp 0 1 G framePointerrange; 



interfaceldrangeA 

methodldrangeA 



The top of the operand stack must contain a reference to an object, which 
should be an instance of a regular class that implements the interface method. 
The header of the object is accessed to yield the interface table (cit) associated 
with the object. The table it maps the method index of the abstract method 
(mi) onto the method index of the implementation (mi'). The latter is then used 
to locate the appropriate program counter in the method table of the object 
pointed at by r. The value of the program counter will be made to point at the 
first proper instruction of the method. A new frame is created, linking to the 
previous frame for the benefit of the return instruction. Execution continues at 
the first instruction of the callee. 

The invokevirtual instruction expects a reference to an object on top of the 
operand stack. The object header of the object is accessed to yield the method 
table associated with the object. The method index mi determines which method 
is to be activated. The value of the program counter pc will be made to point 
at the first proper instruction of the method. A new frame is created, linking to 
the previous frame for the benefit of the return instruction. Execution continues 
at the first instruction of the callee. 
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hp2s(os(sp — params + 1), os(sp — params + 2)) r, 
hha(r) ^ RegularObject oh 
hoh ^ ObjectHeader _ _(MethodTable _ et), 
het(mi) pc', 

h{params — i i— > os(sp — i + 1) | params]} ^ os', 

hfa 0 {fp 0 1 1 -^ Frame(pc 0 l)fp(sp — params)os| ^ fa' 

[invoke^] h(pc, ca, invokevirtual params mi, sp, os, fp, fa, ha) 

invokevirtual , , . / r -i r /\ 

=> (pc, params— 1, os, fp+1, fa), 
if (sp — params 0 1 . . . sp) C stackPointerrangeA 
r e heapPointer,.3„ggAisRegular0bject(ha(r))A 
mi G methodldrangeAfp 0 1 G framePointerrange; 

The immediate operands of the invoke instruction specify the two bytes that 
determine the index of the method in the codeArea. The number of parameters 
is retrieved from the method header (which is stored in the pseudo instruction 
preceding the first proper instruction of the method). 
hp2s offset 0 > pc', 

be 

hca(pc' — 1) (Method Header params locals), 

hjparams — i r-> os(sp 0 1 — i) | i<—[l.. params]} ^ os', 

hfa 0 {fp 0 1 Frame(pc 0 l)fp(sp — params)os} ^ fa' 

[invoke^] h(pc, ca, invoke offset, sp, os, fp, fa) 

XTiTJoke //I I, -( /fi-if/x 

=> (pc , locals + params — 1, os , fpH-1, fa ), 
if (sp — params 0 1 . . . sp) C stackPointerrangeA 
fp 0 1 G framePointerrangei 



4.18 Method Return 

The return instructions below return from a (non-static) method. The four in- 
structions differ only in the return value produced. Each return instruction aban- 
dons the frame pointed at by the frame pointer and returns to the previous frame 
pointer. The appropriate return value is deposited onto the operand stack of the 
caller (except in the last case below, which is intended for a void returning 
method). The side conditions check for stack under/overflow and frame under- 
flow. 

f 

hfa(fp) => Frame pc' fp' sp' os', 
hos(sp) V, 

hos' 0 jsp' 0 1 1-4 v} os" 

[breturn] h(breturn, sp, os, fp, fa) (pc', sp' 0 1, os", fp'), 

if fp G framePointerrangeAsp G stackPointerrangeA 
(sp' 0 1) G stackPointerrangel 
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f 

hfa(fp) ^ Frame pc' fp' sp' os', 
h(os(sp — 1), os(sp)) ^ (hi, lo), 
hos' © {sp' + 1 1 — > hi} © jsp' + 2 I— > lo} ^ os" 

[sreturn] h(sreturn, sp, os, fp, fa) (pc', sp' + 2, os", fp'), 

if fp G framePointerrangeA 
(sp — 1 . . . sp) C stackPointerrangeA 
(sp' + 1 . . . sp' + 2) C stackPointerrangei 

hfa(fp) 4> Frame pc' fp' sp' os' 

[return] (-(return, sp, os, fp, fa) (pc', sp', os', fp'), 

if fp G framePointerrangei 

The instruction areturn is identical to sreturn and thus not shown here. 



4.19 Object Operations 

The new operation creates an instance of the class identified by the given class 
index ci . The class index is used to lookup the class in the class table pertaining 
to the current application program, which itself is found by using the current 
application program id pi as an index in the application program table. The 
fields are initialised to zeroes. 
hpt(pi) 4 ct, 

hct(ci) 4 ClassObject oh is , 

hji 0 I i^[0..is — 1]} 4 ft, 
hhp + l4 hp', 
hs2p(hp') 4> (hir, lOr), 

hos © {sp + 1 1 -^ hir} © {sp + 2 l-> lOr} 4 Os', 
hha © {hp' I— > RegularObject oh ft} 4 ha' 

[new] h(new ci, sp, os, hp, ha, pi, pt) ° 4'^* (^P + 2, os', hp', ha'), 
if (sp + 1 . . . sp + 2) d StSCkPol nt0r range ^ 
pi € progId.gnggAci € classid range 
hp' G heapPointer,.3„g^; 

There are three instructions to determine whether an object is an instance of a 
particular class. The instanceof instruction is for regular objects. The two other 
instructions ainstanceof and aainstanceof handle array objects of primitive and 
non-primitive types respectively. 

The immediate operand cit of the instruction instanceof must be the index 
into the class table of some regular class, t say. In addition, the top of the stack 
must contain a reference r to a regular object of some class, s say. If t and s are 
the same, or if t is a super class of s, the instruction pushes 1 on the operand 
stack; 0 otherwise. (See TableHfor the definition of b2b). 
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hp2s(os(sp — 1), os(sp)) r, 

hha(r) ^ RegularObject(ObjectHeader _ _(MethodTable Cis 

hpt(pi) 4 ct, 

hct(cis) 4 ClassObject super 

hb2b(cit = cisVcitGrange(super)) 4 v, 
hos 0 {sp — 1 1 -^ v} 4 os' 

[instanceof] h(instanceof cit, sp, os, hp, ha, pi, pt) (sp — 1, os'), 

if (sp — 1 . . . sp) C stackPointerrangeA 
r G heapPointer,,3,,ggAisRegular0bject(ha(r))A 
pi e progld, 3 „ggAcis G classldrangei 

The immediate operand dtt of the instruction ainstanceof must specify one 
of the three primitive array types, t say. The top of the stack r must point at 
an array of primitive types, s say. If t and s are the same, 1 is pushed on the 
operand stack; 0 otherwise. 

hp2s(os(sp — 1), os(sp)) 4> r, 

hha(r) 4 ArrayObject(ArrayHeader dtj _)_, 

hb2b(dtsG{bit, byte, shortjAdtt = dtj) 4 v, 
hos 0 {sp — 1 1 — > v} 4 os' 

[ainstanceof] h(ainstanceof dtt, sp, os, hp, ha, pi, pt) (sp — 1 , os'), 

if (sp — 1 . . . sp) C StackPointerrangeA 
r G heapPointerrangeAisArrayObject(ha(r)); 

The immediate operand cit of the instruction ainstanceof must be the index 
into the class table of some regular class, t say. The top of the stack must contain 
a reference r to an array object, whose elements are instances of some class, s 
say. If t and s are the same, or if t is a super class of s, the instruction pushes 1 
on the operand stack; 0 otherwise. 

hp2s(os(sp — 1), os(sp)) 4 r, 

hha(r) 4 ArrayObject(ArrayHeader _ cis _ ref _)_, 

hpt(pi) 4 ct, 

hct(cis) 4 ClassObject super _, 

hb2b(cit = cisVcitGrange(super)) 4 v, 
hos 0 {sp — 1 1 — > v} 4 os' 

[aa instanceof] h(aainstanceof cit, sp, os, hp, ha, pi, pt) *”^4”'^'^ (sp — 1 , os'), 
if (sp — 1 . . . sp) C StackPointerrangeA 
r G heapPointerrgageAisArrayObject(ha(r))A 
pi G progldranggAcis G classid range 5 

The three instructions checkcast, acheckcast and aacheckcast below handle, 
regular objects, array objects of primitive and non-primitive types respectively 
in the same way as the three ‘instance of’ instructions above. 
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The checkcast instruction permits a null reference to be cast to any other 
reference. Otherwise instanceof is used to determine whether the cast is accept- 
able. The operand stack is unaffected. 

hp2s(os(sp — 1), os(sp)) 4> r 

[checkcast®] h(checkcast ci, sp, os, hp, ha, pi, pt) os), 

if (sp — 1 . . . sp) C stackPointerrangeAr = null reference; 

h(instanceof ci, sp, os, hp, ha, pi, pt) ^sp', os'), 

hos'(sp') ^ V 

[checkcast^] h(checkcast ci, sp, os, hp, ha, pi, pt) os), 

if v= 1; 

The two instructions acheckcast and aacheckcast rely on the appropriate 
‘instance of’ instructions in a similar way. They are not shown here. 

4.20 Loading and Storing Object Fields 

The two ‘get’ instructions below load a value from an object field onto the 
operand stack. The two ‘put’ instructions serve to store a field with a byte or a 
short. There are no agetfield or aputfield instructions. The side conditions check 
for stack underflow, null references, or a reference to an object of the wrong type. 
Illegal field indices should be detected by the static semantics. 
hp2s(os(sp — 1), os(sp)) r, 
hha(r) ^ RegularObJect oh ft, 

Kt(i) 4> V 

[bgetfield] h(bgetfield i, sp, os, ha) ^gp _ qs 0 {sp — 1 v}), 

if (sp — 1 . . . sp) C stackPointerrangeA 
r e heapPointer,.3nggAisRegular0bject(ha(r)); 

hp2s(os(sp— 1), os(sp)) r, 
hha(r) ^ RegularObJect oh ft, 
h(ft(i), ft(i -b 1)) (hi, lo), 
hos 0 {sp — 1 1 — > hi} 0 {sp lo} ^ os' 

[sgetfield] h (sgetfield i, sp, os, ha) ^gp, os'), 

if (sp — 1 . . . sp) C StackPointerrangeA 
r G heapPointerrangeAisRegularObject(ha(r)); 
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hp2s(os(sp — 2), os(sp — 1)) r, 
hos(sp) 4 > V, 

hha(r) ^ RegularObject oh ft, 

hft©{ii-^v} 4 ft', 

hha © RegularObject oh ft'} ^ ha' 

[bputfield] h(bputfield i, sp, os, ha) (sp — 3, ha'), 

if (sp — 2 . . . sp) C stackPointerrangeA 
r e heapPointer|. 3 j,ggAisRegular 0 bject(ha(r)); 

hp2s(os(sp — 3), os(sp — 2)) 4> r, 
h(os(sp— 1), os(sp)) ^ (hi, lo), 
hha(r) ^ RegularObject oh ft, 
hft © {i hi} © {i + 1 1 -^ lo} ^ ft', 
hha © {n-^ RegularObject oh ft'} ^ ha' 

[sputfield] h(sputfield i, sp, os, ha) (sp — 4, ha'), 

if (sp — 3 ... sp) C stackPointerrangeA 
r G heapPointer,.3nggAisRegular0bject(ha(r)); 



4.21 Loading and Storing Static Objects 



Static objects are kept in the static area. The instructions bgetstatic, sgetstatic, 
bputstatic, and sputstatic are used to manipulate static objects. 

hp2s(hir, lOr) i, 
hsa(i) V, 

hos © {sp + 1 1 “> v} ^ os' 

[bgetstatic] h(bgetstatic hi,, lo,., sp, os, sa) (sp + 1, os'), 

if (sp + 1) G stackPointerrangei 

hp2s(hir, lOr) i, 

h(sa(i), sa(i + 1)) (hiv, lov), 

hos © {sp + 1 1 -> hiv} © {sp + 2 1 -^ lOv} ^ os' 

[sgetstatic] h (sgetstatic hi r lo,., sp, os, sa) ®^*^**° (sp + 2, os'), 
if (sp + 1 . . . sp + 2) C stackPointerrangei 

hp2s(hir, lOr) i, 
hos(sp) 4> V 

[bputstatic] h(bputstatic hi,, lo,., sp, os, sa) *’”*^‘^**'^ (sp — 1, sa©{ii— >v}), 
if sp G stackPointerrangei 
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hp2s(hir, lOr) i, 

h(os(sp-l), os(sp)) ^ (hiv, lOv), 

hsa 0 {i hiv} 0 {i 0 1 1 -> lOv} ^ sa' 

[sputstatic] h(sputstatic hir lo,., sp, os, sa) *’”* 0 -'***'^ (sp — 2, sa'), 
if (sp — 1 . . . sp) C StackPointetrange; 



4.22 Miscellaneous Instructions 

The breakpoint instruction pops the top two elements of the operand stack, 
interprets them as the high and low byte of a short and appends the short to 
the output stream. 

outputStream = [short]; 

hp2s(os(sp — 1), os(sp)) V 

[breakpoint] (-(breakpoint, sp, os, output) *’^*°:^°*"* (sp — 2, outputJT[v]), 
if (sp — 1 . . . sp) C stackPointerrange; 



4.23 Combining the Rules 



The semantics of the 25 subsets of the instruction set are specified by as many 
different relations, such as These different relations are embedded in the 
relation by the rules below. The relation also automatically increments 
the program counter by one upon completing the execution of an instruction, 
with a few exceptions detailed below. 

The separation of the different categories of instructions shows that the speci- 
fication is modular: The configuration of the virtual machine has 12 components, 
which is quite large. However, the relation for many of the subsets uses only a 
small number of components, thus hiding the remaining components. 

(-(constlnst, sp, os) (sp', os') 

[exec'^°"®*] h(pc, ca, constinst, sp, os, fp, fa, hp, ha, pi, pt, sa, output) 
*^'^(pc0l, sp', os', fp, fa, hp, ha, sa, output); 



Most other relations defining subsets of the instruction set are embedded in 
the relation in the same way as shown above. The exception to this rule 

. , 1 ,1 1 ,• return control switch , invoke... ^ ^ , ,i 

IS lormed by the relations , and => , which calculate the 

new value of the program counter pc'. The automatic increment of the program 
counter is thus suppressed. 



(-(returnlnst, sp, os, fp, fa) (p<;'^ sp', os', fp') 

[exec'®*^'’"] h(pc, ca, returninst, sp, os, fp, fa, hp, ha, pi, pt, sa, output) 
(pc', sp', os', fp', fa, hp, ha, sa, output); 
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4.24 Main Semantic Function 



The function jsp defines the semantics of a JSP programs the transitive closure 
of the relation (below). When given an initial JSP machine configuration, 

jsp computes a list of successive configurations that can be inspected. 

configuration = (programCounter, codeArea, stackPointer, operandStack, 
framePointer, frameArea, heapPointer, heapArea, 
progid, progTable, staticArea, outputStream); 
jsp :: configuration^jconfiguration]; 

jsp sO 

decode 



/ - decode \ 

= (s0 ^ *); 



The relation => accesses the instruction at the current program counter. 
The case analysis by the relation decides to which category the current 
instruction belongs and delegates the actual processing of the instruction to the 
appropriate embedded relation. 

(configuration^configuration); 

h(pc, ca, ca(pc), sp, os, fp, fa, hp, ha, pi, pt, sa, output) 

ha', sa', output') 



decode 



(pc', sp', os', fp', fa', hp', 



[decode] h(pc, ca, sp, os, fp, fa, hp, ha, pi, pt, sa, output) 

derade ^ ^ ^ ^ OUtpUt'); 



A sample machine configuration such as test (see Section] 
as an argument to jsp. 



can be supplied 



5 On the Relationship Between the JVM and the JSP 

The JSP is essentially a scaled down version of the JVM. However, the JSP 
byte codes are not a strict subset of the JVM and translating JVM byte codes 
into JSP byte codes presents some interesting problems. This section comments 
on the relationship between the two virtual machines and sketches a simplified 
process of translating Java class files into the tables required to run JSP code. 

The main problem of translating JVM byte codes into JSP bytecodes is the 
pervasive use of 32-bit data in Java programs. The translator built by Java Soft 
performs a sophisticated analysis to ensure that the computations performed by 
the JSP have the same semantics as those carried out by the JVM. The results 
of the analysis enable the translator to map certain integers and associated 
operations on bytes, and some on shorts. The translator also inserts instructions 
to support multiple precision arithmetic when genuine 32-bit integers are needed. 

The simplified translation to be described here assumes that all integers can 
be represented as shorts. We make no attempt to either identify opportunities 
for using bytes or to warn if shorts are too limited. 

The translation of Java class files into the tables required by the JSP consists 
of the following steps: 

— To allocate all statics in the staticArea, to create an index of all application 
programs in the progTable, and to gather the code sections of all methods in 
the codeArea. 
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— For each application program to allocate a classTable. 

— For each class to allocate a classObject with its objectHeader, a methodTable, 
a superTable, and an interfaceTable, and to decide on the layout of the fields 
in the instance of the class. 

— For each method to allocate a methodHeader, to gather the byte codes of the 
method and to decide on a start address of the method. 

— For each word offset, address or integer to convert it into a short. Depending 
on the sophistication of the translation process this may simply truncate all 
values, or restructure the byte code to deal with values that cannot be fit 
into 16 bits. 

— For each instruction to convert it as indicated below. 



To present the translation of individual JVM byte codes into JSP byte codes 
in a reasonably succinct manner we use the following abbreviations: 



— byte, short, index, params and address stand for numeric values in the appro- 
priate range. 

— class, field, method, and static stand for the appropriate name. 

— [a|5|c] stands for exactly one of the words a, b or c. 

We list all JVM instructions | (on the left), and describe the equivalent 
JSP instruction or sequence of instructions (on the right). 



Constant instructions, 
nop = nop; 

bipush byte = spush 0 byte; 

sipush short = spush(short div 256)(short mod 256); 



aconstnLiii aconstnLiii; 

iconstmi = bconsto, bconstmi; 

iconst[0|l|2|3|4|5] = bconsto, bcOnSt[0|l|2|3|4|5]i 

iconst short = bpush(short div 256), bpush(short mod 256); 

iconst byte = bpush byte; 

— The load, store and increment instructions. 

[a|i]load[om = [A|S] load [ 0 , 2 ] ; 

[a|i]loadj 2 | 3 ] = [A|S]load [4|6]; 

[a|i]load index = [A|S]load(2*index); 

[a|i]store[om = 1^1 S] store [ 0 , 2 ] ; 

[a|i]storej 2 | 3 ] = [A|S]store [4|6]; 

[a|i]store index = [A|S]store(2*index); 
iinc index byte = sinc(2*index) byte; 

— Stack instructions, 

dup = dup2; 

dup_x [1|2] = dup_x(2*16 -I- [2|4]); 
dup2 = dup_x(4*16 -h 4); 

dup2_x [1|2] = dup_x(4*16 + [6|8]); 
pop = pop2; 

pop2 = pop2, pop2; 

swap = swap2; 
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— Array creation, load and store instructions, 

anewarray class = anewarray class; 

newarray [boolean|byte|short|int] = newarray [bit|byte|short| short]; 
arraylength = arraylength; 

[a|b|i|s]load = [A|B|S|S]load; 

[ajbjijsjstore = [AIBjsjsjstore; 

— Instructions for arithmetical, logical and conversion operations. 

i[add|sub|mul|div|rem] = S[add|sub|mul|div|rem] ; 
i|shl|shr|ushr] = S[shl|shr|ushr]; 

i[and|or|xor] = S[and|or|xorj; 

12b = s2b; 

12s = nop; 

— The JVM Conditional branches translate into a number of JSP instructions, 

ifnonnull address =aconstnjn, acmp, ifne address; 

ifnull address =aconstnjn, acmp, ifeq address; 

if[a|i]cmp[eq|lt|gt|ne|ge|le] address = [AjSjcmp, If [eq|lt|gt|ne|ge|le] address; 
if[eq|lt|gt|ne|ge|le] address = s2b. If [eq|lt|gt|ne|ge|le] address; 

goto address = goto address; 

— The JVM instructions tableswitch and lookupswitch are variable length in- 
structions. The tables may contain an arbitrary number of index/target or 
key/target pairs. 

tableswitch from to defaultjindex address} = 

tableswitch default from tojindexi-^ address}; 
lookupswitch size defaultjindex I— > (key, address)} = 
lookupswitch default sizejindexi^ (key, address)}; 

— Exception handling, 
athrow = athrow; 
jsr address = jsr address; 
ret index = ret(2*index); 

— Instructions for method invokation. 

invokeinterface params class method = invokeinterface params class method; 
invokespecial address = invoke address; 

invokestatic address = invoke address; 

invokevirtual params method = invokevirtual params method; 

[a I i] return = [A|S] return; 

return = return; 

— Instructions for object creation and manipulation, 
new class = new class; 

instanceof class = instanceof class, b2s; 
checkcast class = checkcast class; 
getfield field = sgetfield field; 
putfield field = sputfield field; 
getstatic static = sgetstatic static; 
putstatic static = sputstatic static; 

— Miscellaneous instructions, 
breakpoint = breakpoint; 
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— All other JVM instructions are unsupported. These are jsr_w, goto_w, wide, 
monitorenter, monitorexit, multianewarray, and all instructions involving char- 
acter, long, float, and double data types. 

We use SUN’s Java compiler from the Java Development Kit version 1.1 to 
generate class files from sample Java programs. The translations sketched above 
have been implemented as a simple sed/awk script, such that the results of the 
translation can be used as sample input for the main semantic function jsp. This 
will be explored briefly in the next section. 

6 A Sample Program 

We have written a suite of simple Java programs, varying from quick sort to 
specific tests for the object system, to validate aspects of the semantics. The 
workings of the JSP semantics is best illustrated by exposing some details of a 
representative program from our suite. The program below is a slightly modified 
version of Q Page 48]. The two calls to println have been added to show that 
the program is working. Furthermore we have added the call to setColor to 
demonstrate the workings of multiple inheritance. 

public class Pointf int x, y; } ; 

public interface Colorable { 

void setColorC byte r, byte g, byte b) ; 

} 

public class ColoredPoint extends Point implements Colorable { 
byte r,g,b; 

public void setColor ( byte rv, byte gv, byte bv ) { 
r = rv ; g = gv; b = bv ; 

} 

} 

public class test { 

public static void main( String [] args ) { 

Point p = new Point () ; 

ColoredPoint cp = new ColoredPoint () ; 
p = cp ; 

System. out .println ( p.x ) ; 

Colorable c = cp ; 

c.setColorC (byte) 0, (byte) 1, (byte) 2 ) ; 

System. out .println ( cp.b ) ; 

} 

} 



348 



Pieter H. Hartel, Michael J. Butler, and Moshe Levy 



The 12 components of the JSP virtual machine configuration necessary to exe- 
cute test. main are initialised as follows: 

program counter The program counter is initialised to 0. 
code area The code for all methods to be executed by the current application 
program (which includes the initialiser for java. lang. Object) is gathered in 
the code area. An extra instruction at address zero is added to the code 
area whose task it is to invoke the main method. This is represented as 
0 (invoke s2p(test.mainpc)) 

stack pointer The initial value of the stack pointer is argc. 
argc:: stackPointer; 
argc= 1; 

operand stack Initially the operand stack is the same as argv. 
argv:: operandStack; 
argv = {0 I— > 0, 1 1 — > 0}; 

frame pointer The initial value of the frame pointer is —1, to indicate that 
the frame area is initially empty, 
frame area The initial frame area is empty. 

heap pointer The initial heap pointer is —1, indicating an empty heap, 
heap The heap is initially empty. 

application program index testp; is the index in the application program ta- 
ble of the current application program. The formal specification presently 
does not specify a mechanism for switching application programs, 
application program table machinept is the machine wide mapping from ap- 
plication program ids to a class tables, providing one class table per appli- 
cation program. 

Static area machinesa is the machine wide area used to store static values. The 
sample program does not have any static values. 

Initial output The initial output stream is empty. 

test:: configuration; 

test=(0, {0 1 -^ invoke(s2p(test.mainpc))} U machineca, 

argc, argv, - 1, {}, - 1, {}, testp;, machinept, machinesa, D)i 

The JSP byte codes for the main method of class test are shown below. Instead 
of calling the printin method of the library class System, we use the breakpoint 
instruction to inspect the configuration of the machine. 
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test.mairica :: codeArea; 

test.mairica = {test.mairipc — 1 Method Header False False 8 2 6, 
test.mairipc + 0 new Pointci, 
test.mairipc + 1 dup2, 
test.mairipc + 2 invoke(s2p Point. initpc), 
test.maiUpc + 3 i— > astore 2 , 
test.maiUpc + 4 new ColoredPointd, 
test.maiUpc + 5 i— > dup2, 

test.maiUpc + 6 invoke(s2p ColoredPoint. initpc), 

test.mainpc + 7 astore 4, 

test.mainpc + 8 i— > aload 4, 

test.mainpc + 9 astore 2 , 

test.mainpc + 10 i— > nop, 

test.mainpc + 11 aload 2 , 

test.mainpc + 12 i— > sgetfield Point. Xfi, 

test.mainpc + 13 i--> breakpoint, 

test.mainpc + 14 i— > aload 4, 

test.mainpc + 15 astore 6, 

test.mainpc + 16 aload 6, 

test.mainpc + 17 bconsto, test.mainpc + 18 bconsto, 

test.mainpc + 19 bconsto, test.mainpc + 20 bconsti, 

test.mainpc + 21 bconsto, test.mainpc + 22 i— > bconsto, 

test.mainpc + 23 invokeinterface 8 Colorable;; Colorable. setColor^ 

test.mainpc + 24 nop, 

test.mainpc + 25 i— > aload 4, 

test.mainpc + 26 i— > sgetfield ColoredPoint. bf;, 

test.mainpc + 27 breakpoint, 

test.mainpc + 28 i— > return}; 

The execution of the program can be expressed simply as jsp(test). The latos 
tool makes it possible to trace the execution of the program, and to experiment 
with different initial configurations. 

The program starts by creating two heap objects, one representing a Point 
and the second representing a ColoredPoint. The objects are properly initialised 
by a chain of calls to the initialisers of the super classes. The most interesting 
instruction is the invokeinterface, which has to discover that the instance of 
ColoredPoint indeed implements the setColor method. 

The program causes two values to be appended to the output stream (via 
the breakpoint instruction). The values are 0 (because the coordinates of the 
class Point are initialised to 0) and 2 (because ColoredPoint. setColor assigns this 
value to the field cp.b). 




350 



Pieter H. Hartel, Michael J. Butler, and Moshe Levy 



7 Conclusions and Future Work 

The result of formalising the operational semantics of the JSP is a specification 
that is: 

— succinct, because it is shorter and more detailed than the natural language 
documents. 

— clear, because the rules are not open to more than one interpretation. 

— executable, because a program can be generated automatically from the 
specification, which can subsequently be executed to validate and explore 
the behaviour of sample Java programs. 

— consistent, because the tools available for the notation used check well 
formedness, types and source dependency. 

— modular, because sub sets of rules can be considered in isolation. 

— large, because it has to cope with 25 groups of 124 different JSP instructions. 

— not difficult to read, because the rules describing the semantics of many 
instructions are similar. 

The fact that our specification is executable allows implementors to experi- 
ment with Java programs and byte codes, inspect the configuration of the JSP 
and generally sharpen their understanding of the mechanisms. Without tool sup- 
port it would be impossible to construct a derivation tree for anything but the 
most trivial Java programs. With the help of our latos tool, our specification 
could be used to automatically construct derivation trees for small to medium 
sized programs. 

We hope to be able to make our complete specification available on the Web, 
so that others may down load the specification and the latos tool and use these 
resources whilst implementing a JSP. 

In future we hope to gain access to a complete operational semantics of 
the JVM, formally specify the JVM to JSP translator and attempt to give a 
correctness proof of the translator with respect to the semantics of the JVM 
byte codes and that of the JSP byte codes. 

We have not considered the static semantics of a JSP, that is a specification of 
properties of JSP programs that can be be checked statically, for example by the 
JVM to JSP byte code translator, or the byte code verifier. An important goal 
would be to investigate which static properties of the JVM that are preserved by 
the JVM to JSP translator. The work of Stata and Abadi ^9 offers a promising 
basis for this. 
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Abstract. We propose in this paper a definition of the semantics of 
Java programs which can be used as a basis for the standardization of 
the language and of its implementation on the Java Virtual Machine. 
The definition provides a machine and system independent view of the 
language as it is seen by the Java programmer. It takes care to directly 
reflect the description in the Java language reference manual so that the 
basic design decisions can be checked by standardizers and implementors 
against a mathematical model. 

Our definition is the basis for a related definition we give in a sequel to 
this paper for the implementation of Java on the Java Virtual Machine as 
described in the language and in the Virtual Machine reference manuals. 



1 Introduction 

In this chapter we formalize the semantics of Java by a system independent, 
purely mathematical yet easily manageable model, which reflects directly the 
intuitions and design decisions underlying the language as described in Java’s 
language reference manual (LRM) Our goal is to contribute to a rigorous 
yet readable definition of the entire language, which supports the programmer’s 
understanding of Java programs. At the same time the definition should provide 
a basis for the standardization and clarification of critical language features, for 
the specification and evaluation of variations or extensions of the language and 
for the mathematical analysis and comparison of Java implementations. In par- 
ticular we aim for a model that is amenable to both mathematical and computer 
assisted proofs and to experimental validation of the correctness of compilation 
schemes to Java Virtual Machine (JVM) code and of safety properties of Java 
programs when executed on the JVM. 

These goals oblige us to abstract the central ideas of Java’s LRM into a trans- 
parent but rigorous form, whose adequacy can be recognized (or falsified in the 
sense of Popper^J) by inspection, i.e., by a direct comparison of the mathemat- 
ical definitions with the verbal descriptions in the manual (see the discussion on 

* A preliminary version of this paper has been presented to the IFIP WG 2.2 Meet- 
ing in Graz (22.-26.9.1997) and to the Workshop on Programming Languages in 
A vendor!/ Fehmarn (24.-26.9.1997) 
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ground models in Q) . To be able to establish the required simplicity and faith- 
fulness of such abstractions, one needs a modeling technique which provides the 
following two possibilities: 

— to express the basic language concepts (its objects and operations) directly, 
without encoding, i.e., as abstract entities as they appear in the LRM, 

— to model basic actions on the level of abstraction of the LRM, i.e., as local 
modifications with clear preconditions and effects and avoiding any a priori 
imposed static representations of actions-in-time. 

Gurevich’s Abstract State Machines (ASMs), previously called Evolving Al- 
gebras, see provide the fundamental concept for such a modeling technique. 
ASMs have been successfully used to model the semantics and implementation 
of programming languages as different as Prolog OccamP0, VHDL Q, 

C++BB and others. Furthermore, ASMs are effective in modeling architectures 
PB, protocols BQ, control software BQ, and by being amenable to execution 
(see for an ASM interpreter |p) they can be used for high-level validation. See 
B for a survey. ASMs have a simple mathematical foundation fj, which jus- 
tifies their intuitive understanding as “pseudo-code over abstract data” so that 
the practitioner can use them correctly and successfully without having to go 
through any special formal training. Therefore we invite the reader to go ahead 
with reading our specification and to consult the formal definition of ASMs in 
PP only should the necessity arise. 

We formally define the semantics of Java by providing an ASM which in- 
terprets arbitrary Java programs. A Java program consists of a set of classes. 
In the use of a class there are three phases: parsing, elaboration, and execu- 
tion. Parsing determines the grammatical form yielding an abstract syntax tree. 
Elaboration, the static phase, determines whether the class is well-typed and 
well-formed and records such information as annotations in the abstract syntax 
tree. Execution, the dynamic phase, loads, links, and executes the code of the 
class by evaluating expressions and modifying the memory. Corresponding to 
these phases, a full mathematical definition of Java needs a grammar, a static 
and a dynamic semantics. The grammar is well defined in the LRM ^p. Numer- 
ous authors have formalized the static semantics of sequential Java, in particular 
its type soundness The dynamic semantics given in these papers 

cover only a small structured sublanguage of sequential Java and do not con- 
sider the interaction of jump statements (like break), exception handling and 
concurrency, which we treat in full. We therefore concentrate in this paper on a 
complete but nevertheless transparent mathematical definition of the dynamic 
semantics of Java. 

Two features characterize our modeling of the dynamics of Java programs: it 
is run-time instead of syntax oriented and it comes with a systematic separation 
of static and dynamic concerns. 

To let the dynamic aspects stand out as clearly as possible, we relegate 
compile-time matters to static functions as much as we can without making the 
specification unreadable for the Java practitioners with no training in formal 
methods. As is well known such a separation of statics and dynamics also lays 
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the ground for efficient implementations of the static features for program inter- 
pretation and for the generation of program debuggers, animation tools etc. (see 
for example ^3). In addition it has led to an interesting and novel integration, 
into ASM specifications, of various useful methods from functional programming 
and algebraic specifications for the definition of static (compile-time) functions. 

After some experimentation we decided to strictly stick to a run-time and not 
syntax-directed modeling. Structural methods (like SOS natural semantics 
^3, action semantics ^3, etc.) are known to work well for the definition of 
languages where the control flow essentially follows the syntax (tree) structure 
with only little involvement of environment information (as is the case for exam- 
ple for purely functional languages or for strongly syntax supported languages) . 
Structural methods offer however no advantage in cases of languages like Java 
where the participation of the run-time environment in determining the program 
control flow becomes more complex and when concurrency features — which are 
not syntax driven — enter the scene. The decision for a strictly process-oriented 
modeling throughout the entire Java language provides the programmer with 
a uniform view of the intricate interaction of the different language features 
like jumps, returns, exceptions, concurrency and synchronization. The use of 
ASMs makes this view particularly transparent (and thus easily comparable 
with the verbal explanations in the LRM): ASM specifications concentrate on 
local changes which avoids having to carry, for a given action, global contexts 
which remain constant for this action. 

Before we proceed to the technical overview of the paper we want to make 
clear that our paper is not an introduction to (programming in) Java; the in- 
tended reader is familiar with Java: a programmer, a standardizer, implementor 
or teacher who looks for a rigorous but easy to understand language definition. 



1.1 Overview 

To make the complete dynamic semantics of Java manageable, we factor it into 
five sublanguages, by isolating orthogonal language features, namely imperative, 
procedural, object-oriented, exception handling and concurrency features. This 
can be made in such a way that each corresponding ASM model is a conservative 
extension of its predecessor. We found it interesting to discover at a later stage 
of our work on the Java language that an analogous modular decomposition can 
be given also for models of the JVM Q. 

Section J defines the basic ASM Javai for Java’s imperative core, which 
is essentially a while language. It contains statements and expressions over 
Java’s primitive types. This section provides an introduction to our approach 
and notation. 

In Sect. B we upgrade Javax to Javac by including Java’s classes; Javac 
supports class fields, class methods, and class initializers. Thus, Javac defines an 
object-based sublanguage of Java, which supports procedural abstraction and 
(module-) global variables. 
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In Sect.J we extend Javac to Javao by including Java’s (real) object-oriented 
concepts, namely instances, instance creation, instance field access, instance 
method calls with late binding, casts, and null pointers. 

Section ^extends Javao with exceptions, resulting in the model Javag. We 
specify which exception will be thrown when semantic constraints are violated. 
We introduce Java’s throw and try/catch/finally statement, and we exhibit 
the interaction of exception handling with other language constructs. 

In Sect. 5 we move from sequential Java to concurrent Java, the correspond- 
ing ASM model Javar introduces Java’s lightweight processes, called threads, 
their synchronization mechanism using locks, and their stopping, waiting and 
notification mechanism. We study two complementary memory models: the first 
one uses only the main memory for storing objects, the second model uses the 
local working memory as much as possible. For ‘best practice programs’ both 
agree. 

In order not to lengthen the definition of our models by tedious and routine 
repetitions, we skip those language constructs (in particular in Javaj), which 
can easily be reduced to the core constructs dealt with explicitly in our models; 
examples are alternative control structures (like for, do, switch), pre- and post- 
fix operators (++, — ), conditional operators (&&, I I ), assignments combined with 
operations (+=, -=, etc.), variable initialization and similar expressive sugar. And 
since most of the object-oriented concepts of Javao apply equally well to arrays 
and strings, we do not treat them either. 

We do not consider Java packages, compilation units, and the visibility of 
names. We abstract from these aspects because they do not influence the dy- 
namic semantics. We do not deal with input /output questions. We also do not 
consider the loading and linking of classes nor garbage collection. This is in 
accordance with the usual understanding of the dynamic semantics of program- 
ming languages. Yet, in Java these aspects are semantically visible: Dynamic 
loading and linking might raise exceptions, and in the presence of finalize 
methods also garbage collection is semantically visible. We plan to include these 
interesting aspects in later stages of the project. 

2 The Imperative Core of Java 

In this section we define the basic model Javax, which defines the semantics 
of the sequential imperative core of Java with statements (appearing in Java’s 
method bodies) and expressions (appearing in statements) over Java’s primitive 
types. 

2.1 Signature 

For each of our models we start with an arbitrary but fixed Java program. We 
separate standard compile-time matters from run-time issues by assuming that 
the program is given in a form in which it appears after parsing and elaboration, 
namely as an annotated abstract syntax tree. In this way we can abstract from 
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Fig. 1 Abstract Java Syntax for Javax 



Exp :\— Lit I Uop Exp \ Exp Bop Exp \ Var \ Var = Exp \ Exp? Exp: Exp: 
Stm ::= ; | Exp\ \ Lab : Stm \ break ia6; | continue ia6; 

I if (Exp) Stm else Stm \ while (Exp) Stm \ Block 
Block ::= { Type Var\ . . . Type Var; Stm . . . Stm} 

Phrase ::= Exp \ Stm \ finished 



the peculiarities of Java’s concrete syntax and rely upon a series of useful syn- 
tactical simplifications which will be mentioned as we proceed in building our 
models. 

The abstract syntax of Java’s imperative core is defined in Fig.H It can also 
be viewed as defining corresponding domains (also called universes) of Javaj- Al- 
though in our ASM’s we will extend some of these domains by a small number of 
auxiliary constructs which do not appear in Java’s syntax, we use the names of 
Java’s grammatical constructs also as names for the corresponding (extended) 
ASM universes. We are sure the reader will be thankful for the simplified nota- 
tion provided by this naming convention. Usually we denote domains by words 
beginning with a capital letter and write dom for elements of Dom, i.e. assum- 
ing without further mentioning that dom G Dom. Figure|uses some additional 
universes, which represent basic syntactic constructs of Java, namely: 

Lit, Bop, Uop, Type, Var, Lab 

for Java’s literals (except strings), Java’s binary operators (except assignment 
and not including conditional operators), Java’s unary operators (except prim- 
itive cast and pre- and postfix operators, but including all primitive widening 
and narrowing conversions), Java’s primitive types, local variables, and labels, 
respectively. 

As a result of the parsing and elaboration of the given Java program, no 
variable is declared twice, i.e., there are no hidden variables; all conversions are 
made explicit by applying the corresponding unary conversion operator; local 
variable initializers are syntactically reduced to ordinary assignments, following 
the variable declarations, which are all shifted to the beginning of blocks. 

To separate as much as possible the dynamic (run-time) aspects from the 
static aspects the compiler (parsing and elaboration) can take care of, we use 
the idea (which is taken from the work on Occam Q) to view program execution 
by a thread as a walk of the thread through the program’s annotated abstract 
syntax tree: at each node the corresponding task is executed and then the control 
flow proceeds to the next task. The reader should keep in mind that the nodes 
represent occurrences of program constructs (phrases) and that all the functions 
we are going to define are defined on such occurrences of program constructs. 
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The following dynamic function task, an abstract program counter, always points 
to the current phrase to execute. 



task : Phrase 

The abstract program counter task must be updated according to Java’s con- 
trol flow. For sequential Java the control flow is fixed. As a matter of fact, for 
any statement and expression the LRM defines which substatement or subex- 
pression to evaluate first and which expression or statement — depending on the 
context — to evaluate next (if any) . This is captured by the following two static 
functions fst and nxt, which yield finished, if there is no first or next phrase 
to execute. (The definition of these functions, belonging to the compiler, will be 
given below by a recursion on abstract syntax trees.) 

fst, nxt : Phrase Phrase 

Proceeding from one task to the next task in accordance to Java’s (uncondi- 
tional) control flow is thus reduced to the following macro: 

proceed = task := nxt (task) 

Statement execution and side-effects of expression evaluation typically up- 
date the local environment, formalized (for Javax) using a dynamic function 

loc : Var Value 

which captures the association between local variables and their values (bound 
to the given method activation). The universe Value, defined by 

Value ::= Bool \ Integers \ Floats 

contains Java’s primitive values: booleans, integers in specific ranges, and float- 
ing point numbers according to IEEE 754. For simplicity, we identify Java’s 
booleans with the corresponding ASM values true and false, and abbreviate in 
our formulae often bool = true to bool. 

The storage of intermediate values of expressions, which are computed to be 
used as arguments or operands in larger expressions or to affect the conditional 
control flow among statements, is formalized using a dynamic function on the 
subset Exp of Phrase: 



val : Exp Value 

This concludes the definition of the signature of Javax. Minor additions per- 
taining only to some special constructs will be presented in the corresponding 
sections. 
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2.2 Transition Rules 

Transition rules describe how the states of Javax, here its dynamic functions 
task^ loc^ and val, change over time by evaluation of expressions and execution 
of statements. 

The initial state of Javax is defined by the given phrase, which defines also 
the static functions fst and nxt. In particular, we assume that nxt(phrase) = 
finished, task points to the first phrase to be executed, formally task = 
fst(phrase). The functions loc and val are everywhere undefined. 

The run terminates, if no rule can be executed, because the preconditions of 
all rules evaluate to false. If the execution completes normally, i.e., without any 
run-time violation, the ASM reaches: task = finished. 



Expressions 

The expressions of Java’s imperative core — except the conditional operators — are 
evaluated from left to right and from innermost to outermost. This is described 
in Chap. 15.6 of Java’s LRM (In the remainder of this chapter, we will 
abbreviate citations like this one using the ‘§’ sign, writing (§ 15.6) to cite the 
corresponding chapter of the LRM. This should help the reader to check the 
correctness of our ASM formalization by comparing it with the LRM.) 

We capture this ‘postfix’ evaluation order as follows: When the expression exp 
is going to be executed, we start with task pointing to fst(exp), then we repeat 
applying nxt on task until task points to exp. During this process, we evaluate 
any expression to which task points and assign the computed expression value 
to the task using val. 

This evaluation order is reflected in the recursive definition of the functions fst 
and nxt: If the expression exp does not have any subexpression, we set fst (exp) = 
exp. Otherwise let exp have the form f(expi, . . . , expn), where / denotes any n- 
ary expression constructor, but not the conditional expression. We set fst(exp) = 
fst(expi), nxt{expi) = fst(expi+i), 0 < i < n, and nxt(expn) = exp. For the 
special case of conditional expressions, see below. 

This establishes the required control flow. It remains to specify the normal 
evaluation of expressions (in Javax constants and operator terms, variables and 
conditional expressions) by transition rules. 



Evaluating constants and operators. The value lit of every occurrence lit 
of a literal is as defined in § 15.7.1. Its dynamic semantics is defined by the 
following transition rule: 

if task is lit then (Literal) 

val(task) lit 
proceed 

The macro ‘is’, defined below, tests whether task points to lit. The macro 
will be refined for Javax. 
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task is phrase = task = phrase 

The reader should keep in mind that by our typing convention for elements 
of universes, task is lit stands for task = lit A lit € Lit. lit stands for a variable, 
which has to be instantiated by an element of Lit. Similarly for all rules below. 

The value of a unary expression with operation symbol 0 is defined by apply- 
ing the corresponding semantic operation 0 — a static function for Javax, which 
is defined in the LRM — on the result of the operand (§ 15.13, 15.14) 

if task is (Qexp) then (UnaryExp) 

val(task) := Q{val{exp)) 
proceed 

The value of a binary expression with operation symbol 0 is defined by ap- 
plying the corresponding semantic operation 0 on the results of both operands. 
The rule cannot be executed, if the binary operator is an integer division or re- 
mainder operation (denoted by / and %), and the value of the second operand is 
0 (§ 15.13, 15.14). (See Sect. ^Binary Exp for the exception case.) The transition 
rule is defined by: 

if task is {expi 0 exp 2 ) A (0 G {/, %} => val{exp 2 ) 7 ^ 0) then (BinaryExp) 

val(task) := val(exp\)®val{exp 2 ) 
proceed 



Using variables. The value of a variable is the value bound under the name 
of the variable in the local environment (§ 15.13.1). 

if task is var then (VarAcc) 

val(task) loc(var) 
proceed 

The value of a simple assignment expression is the value of its right hand 
side. The execution of the assignment operator replaces the existing value bound 
under the name of the variable of the left hand side in the local environment by 
the result of the right hand side (§ 15.25.1). 

if task is {var = exp) then (VarAss) 

loc(var) := val(exp) 
val(task) := val(exp) 
proceed 



Evaluating conditional expressions. To determine the value of a conditional 
expression requires two steps. The condition is evaluated first: if its value is true, 
the value of the conditional expression is the value of the second expression, 
otherwise it is the value of the third expression (§ 15.22-24). 

Processing a phrase in several steps means that we have to associate several 
rules with the same phrase, which are executed at different times. In order to 
distinguish the rules syntactically, we associate the rules not only with the single 
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Fig. 2 Normal control flow of a conditional expression / block 


let exp = expi ? exp 2 : exps : in 


let stm = {type var\ . . . ; stm\ . . . strun} in 


fst{exp) = fst {exp f) 


fst{stm) = fst{stm\) 


nxt{expi) = exp 


nxt{stmi) = fst{stmi+i),0 <i < n 


nxt{expi) = expi’., i £ {2, 3} 


nxt{stnin) = nxt{stm) 



phrase but with (some of) its subphrases, too. Here, we associate a rule with 
the conditional expression and one with the auxiliary subphrases of form {exp :). 
The static control flow is defined in Fig.^ 

The rule for the condition triggers the evaluation of the second or third 
expression. 

if task is expit exp 2 '- exps: then (IfExp) 

if val(expi) then task := fst{exp 2 ) 
else task := fst{exps) 

The rule for the second and third expression assigns the value of the evaluated 
subexpression to the immediately enclosing conditional expression and proceeds. 

if task is exp : then (ThenElseExp) 

val{if {task)) := val{exp) 
task := nxt{if{task)) 

The auxiliary static function if always points from a then or else (sub-) expres- 
sion to its father, namely the conditional expression. 



Statements 

The sequence of execution of a Java program is controlled by statements. We 
distinguish in Javax three statement kinds: those which transfer control uncon- 
ditionally, those which transfer control conditionally, and those which transfer 
control abruptly. The latter are described in Sect. ^3 



Unconditional transfer of control. Statements whose only effect is to trans- 
fer control unconditionally do not have transition rules - their effect is already 
precompiled into the functions fst and nxt. 

A block is executed by executing each of the statements in order from first 
to last (§ 14.2), see Fig. H We abstract from executing variable declarations 
because in our ASM the assignment of a value to a variable implicitly enlarges 
the domain of the environment, provided the variable is not already in the envi- 
ronment’s domain. Variable access is always defined, since the elaboration phase 
assures — due to the rules of definite assignment (§ 16) — that every local variable 
is assigned before it is used. We can also abstract from deleting variables from 
the environment, because we know that in the annotated syntax tree no variable 
is declared twice. 
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Fig. 3 Normal control flow of an empty statement / an expression statement / 
a labeled statement 



let stm = ; in let stm = exp\ in let stm = lab : stm\ in 

fst{stm) = nxt{stm) fst{stm) = fst(exp) fst{stm) = fst{stm\) 
nxt{exp) = nxt(stm) nxt(stmi) = nxt(stm) 



Fig. 4 Normal control flow of if/ while 



let stm — if (exp) stmi else stm 2 in 
fst{stm) = fst{exp) 
nxt(exp) = stm 
nxt{stmi) = nxt(stm),i € {1,2} 



let stm — while (exp)stmi in 
fst{stm) = fst(exp) 
nxt(exp) = stm 
nxt{stmi) = fst(exp) 



An empty statement does nothing (§ 14.5). The control flow simply skips the 
empty statement. FigureHshows its control flow. 

An expression statement is executed by evaluating the expression (§ 14.7). 
Figure H shows its control flow. (In the JVM an additional action is taken to 
discard the value because it is not needed furthermore.) 

A labeled statement is executed by executing the immediately contained state- 
ment (§ 14.6), see Fig.H 



Conditional transfer of control. An if-else statement is executed by first 
evaluating the expression. Execution continues by making a choice based on the 
resulting value. If the value is true, the first contained statement is executed, 
otherwise the second contained statement is executed (§ 14.8). Figureflcaptures 
the static aspect of the control flow. The dynamic aspect is described by the 
following transition rule: 

if task is if (exp) stmi else stm 2 then (IfStm) 

if val(exp) then task -.= fst(stmi) 
else task ■.= fst(stm 2 ) 

A while statement is executed by first evaluating the expression. Execution 
continues by making a choice on the resulting value. If the value is true, then the 
contained statement is executed, otherwise no further action is taken (§ 14.10), 
see Fig.^for the definition of fst and nxt. 

if task is while (exp) stm then 
if val(exp) then task -.= fst(stm) 
else task := nxt (task) 



(While) 
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2.3 Abrupt Transfer of Control 

The preceding subsection describes the normal control flow, in which certain 
steps of computations are carried out. Normal control flow can be abrupted 
in Javax by the restricted jump statements break and continue. Upon switch 
from normal to abrupted mode, the execution of one or more phrases may be 
terminated before all steps of their normal mode of execution have completed, 
e.g., only part of the statements of a block are executed because a break or 
continue was encountered which transfers control out of the block. Phrases 
which do not terminate normally are said to complete abruptly. In the later 
extensions abrupt completion can also be due to return from procedure execution 
(in Javac) or to raising and handling of exceptions (in Javax). 

Signature. Any abrupt completion always has an associated reason, which in 
Javax is a break or continue with a given label. For a uniform formulation of 
these interrupts (which supports the process oriented view of interrupt descrip- 
tion in the LRM) we introduce a universe 

Reason Break(Lab) \ Continue{Lab) 

(to be extended in Javac and Javax) together with a dynamic function 

mode : Reason 

which records whether the current mode is normal {mode is undefined) or ab- 
rupted, due to a Break or a Continue with a specific label. 

When execution is abrupted control transfers up the grammatical nesting 
level. To formalize this we use an auxiliary function 

up : Phrase — > Phrase 

which applied to a phrase returns the next enclosing phrase for a given phrase, 
which might handle the reason for abruption. (We say that phrase A is enclosed 
by phrase B, if B contains A.) 

In Javax up points only to labeled statements. Formally, let c be any n-ary 
phrase constructor but not a labeled statement, i.e. phrase = c{phrase\, . . . , 
phrasen), then we set up{phrasei) = up{phrase),l < i < n. li phrase denotes 
a labeled statement, i.e. phrase = lab : stmi, we set up{stmi) = phrase. And 
for the given program up returns finished. (We shall extend this definition for 
static, try-catch, try-f inally and synchronized clauses, see Sects. ^^Jand 

D- 

The function up is used to update the dynamic function task, formalized by 
the macro: 



abrupt = task := up (task). 

The function up supports the uniform treatment of different forms of inter- 
rupts: in each of them the control flow is transfered up the grammatical nesting 
level and then up the method invocation stack. Whereas for break and continue 
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the control remains within the given method, for return the control leaves the 
current method (see Sect.fl and for exceptions the control can climb up the 
whole method invocation stack (see Sect-fl. 

Transition rules. In Javax a break statement or a continue statement (which 
can be assumed to appear in the annotated syntax tree with label lab) transfers 
control to the (innermost) enclosing labeled statement having lab as its label — 
Java’s context conditions guarantee the existence of such a labeled statement. 
This labeled statement, which is called the target, and all statements which 
are passed during this transfer complete abruptly, the reason being a Break or 
Continue with label lab. 

if tasA: is jump /o6; then (Jump) 

mode := Jump(lab) 
abrupt 

for (jump, Jump) G {(break. Break), (continue. Continue)} 

If task points to a labeled statement (due to the definitions of fst and nxt 
this can happen only in abrupted mode) it is checked whether the label of the 
reason agrees with the label of the statement. In case it does, and the reason is 
a Break, execution proceeds normally at the next phrase of the target, in case 
of a Continue the next iteration of the embedded statement is executed, which 
for while statements is the first phrase of the target (§ 14.13,14.14,14.10). (If 
we would include do and for statements the following rule has to be refined.) 

if task is lab : stm then (LabStm) 

if mode = Jump(lab) then 
mode := undef 
task := jump 
else abrupt 

for [Jump, jump) G {{Break, nxt(task)), {Continue, fst(stm))} 

Jump transfers control immediately to the nearest enclosing matching la- 
beled statement. We could have included this into the definition of nxt, avoiding 
Jump and LabStm. We prefer to formulate a rule here in order to smoothen 
the refinement in the context of exception handling and multithreading where 
control may not directly proceed at the first or next phrase of the target but has 
to execute certain clauses first, see Sects. O^ndO 

3 Adding Classes 

Javac enhances Javax for an object-based language. Javac includes class fields, 
class methods and class initializers. (These entities are also known as static fields, 
static methods and static initializers, respectively.) Javac also supports a limited 
form of interfaces, namely its static fields. Conceptually Javac describes the 
semantics of an imperative language supporting modules. Modula2 is a typical 
representative of such a language: Class fields are the module’s global variables, 
class methods are the module’s procedures and class initializers correspond to 
module initializers. 
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Fig. 5 Abstract Java Syntax for Javac 



Exp ::= ... I FieldSpec \ FieldSpec = Exp \ MethodSpec{Exp, . . . , Exp) 
Stm . I return | return; 

Phrase ::=... | Init 

Init static Block endstatic 



3.1 Signature 

FigureHshows Javac ’s abstract syntax, where ‘. . .’ stands for the constructs of 
Javax. The abstract syntax uses the universes FieldSpec and MethodSpec, which 
denote class fields and overloaded class methods, respectively. 

FieldSpec = Class x Field 
MethodSpec = Class x Method x Functionality 
Functionality = Type* x {Type \ {void}) 

These type abbreviations use Javac ’s new abstract universes 

Class, Field, Method 

denoting the given program’s fully qualified classes and interfaces, as well as 
field and method identifiers, respectively. Furthermore, we extend the universe 
Var of Javax to denote local variables and method parameters. 

Unless otherwise stated we will not distinguish classes and interfaces. So 
whenever we speak of classes, class fields or class initializers, this normally in- 
cludes interfaces, their fields and initializers, respectively. 

For the sake of simplicity, but without loss of generality, we assume that due 
to the elaboration phase the annotated syntax trees in Javac have the follow- 
ing properties: Method specifications denote the most specific method chosen at 
compile-time (§ 15.11.2.2). Every execution path of any method body ends with 
a return statement. Any class has a class initializer — its body (whose function 
is to initialize the class fields at the first active use of the class, see below) may 
be empty. Non constant class field initializations are syntactically reduced to as- 
signments and are placed at the beginning of a class initializer. Javac abstracts 
from initializations of constant fields; the latter are final class fields, whose 
values are compile-time constants (§ 15.27). The value of constant fields is pre- 
computed (as part of the elaboration phase) and stored in the program’s class 
and interface environment. 

Javac programs are executed w.r.t. a static class environment, which is set up 
during parsing and elaboration. Each class (not interface) declaration consists 
of the superclass of the class, of its implemented interfaces, its class fields, class 
methods, and its initializer. Due to the fact that interfaces do not contain code. 



366 Egon Borger and Wolfram Schulte 



an interface declaration consists only of its superinterfaces, its static fields and 
its static initializer. 

The following static functions look up information in the environment and 
either access the environment directly or traverse the inheritance hierarchy from 
bottom to top (subtype to supertype): 



super : 
supers : 
interfaces : 
classinit : 
classFields : 
classFieldValue : 
classMethod : 



Class Class 

Class V Class 

Class V Class 

Class Init 

Class V Field 

FieldSpec Value 

MethodSpec Var* x Block 



The function super returns the direct superclass of the specified class, provided 
the class has a superclass. The function supers calculates the transitive closure 
of the direct superclass relationship. The function interfaces returns the direct 
interfaces of a class. The function classinit returns the body of a class initializer 
of the given class. The function classFields returns the set of all fields declared 
by the class (which have to be initialized exactly once for the whole class, just 
before their first use, as specified in the LRM). The function classFieldValue 
maps non-constant class fields to their default values, and constant class fields 
to the values of their respective compile-time constant expressions. The function 
classMethod looks up the method in the specified class. 

In Javac we distinguish three initialization states for a class: either the ini- 
tialization of the class has not yet started, it is InProgress or it is already Done, 
so that we introduce a universe 

InitState ::= InProgress \ Done 
together with a dynamic function 

init : Class InitState 

which records the current initialization status of a class. A class is ‘initialized’, 
if initialization for the class is InProgress or Done. (If the initialization of class 
has not yet started, init(class) is undefined.) 

initialized) c/oss) = init{class) £ {InProgress , Done} 

To model the dynamic state of class fields, we have to reserve storage for all 
these variables. The dynamic function glo returns the value stored under a field 
specification. 

glo : FieldSpec Value 

The introduction of methods has a fundamental effect on the definition of 
Javax’s dynamic objects. During execution of a Javax (method) body the local 
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environment and the value of an expression is computed and both may then 
be used in future computation steps (of that method). However, if in Javac 
a method invokes itself recursively, the same statements and expressions are 
executed several times, thus ‘overwriting’ the local environment as well as values 
of expressions. To handle this properly, we refine the two functions loc and val 
by indexing the value of any expression or (local) variable in the environment 
with the calling depth of the code in execution. This corresponds to the usual 
technique of introducing stacks, to handle recursive method activation. Likewise, 
we introduce a task stack, whose topmost element always points to the code in 
execution. When a method is invoked, we push the first phrase of the invoked 
method onto the stack to resume the task of the invoker after the invoked method 
has finished and its task is popped from the stack. The signatures of the modified 
dynamic functions of Javax are extended to finite sequences: 

tasko ■ Phrase* 
valo : {Exp — > Value)* 
loco '■ {Var ^ Value)* 

During execution of a single method body in Javax the calling depth does 
not change. Hence, task, val and loc of Javax are guaranteed to be the topmost 
elements of the corresponding dynamic functions in Javac. 

task = top(tasko) 
val = top{valo) 
loc = top(loco) 

Via this refinement Javac can be shown to be a conservative extension of Javax 
(see the conclusion) so that all propositions which are valid for Javax carry over 
mutatis mutandis to Javac. 

To simplify notation in connection with method invocation and return we 

introduce frames, denoting triples consisting of the stacks tasko , valo and loco ■ 

frames = {tasko, valo, loco) 

Finally, we extend the universe Reason, since return statements always com- 
plete the method’s body abruptly. We introduce the new reasons Return and 
Result, the latter carrying a specific Value which (eventually, see Sect. Q be- 
comes the value of the method invocation. 

Reason ::= ... | Return \ Result{Value) 

3.2 Transition Rules 

Via the refinement of task, loc and val in Javac, each Javax-rule becomes a rule 
of Javac. Therefore it only remains to give here the rules for the evaluation of 
the new Javac-expressions and for the execution of the new Javac-statements. 

The initial state of Javac is as follows: The environment, modeled by the 
respective lookup functions and predicates, is defined by the given program, 
which consists of a list of classes and interfaces. For any method body and class 
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initializer, the functions fst,nxt and up are defined as given in Sect. ^ — only 
return and static need new definitions. All class fields of all classes are set to 
their default or constant values and no class is initialized. 

The run of Javac starts by invoking the class method main : MethodSpec 
(with an empty parameter list) being part of the environment, task denotes 
the first phrase of main’s body, loc and val are undefined. Formally: frames = 
sta,rt{fst(body)) , where the macro ‘start’ is used to initialize the task, temporary 
and locals stacks, respectively. (The expression (. . .) denotes finite sequences.) 

start(p/irase) = {{phrase) ,{%) , (0)) 

For the sake of exposition let us first assume that all classes have been ini- 
tialized. When and how classes are initialized is explained in Sect.^H 

The run terminates, if no rule of Javac can be executed any more. If the 
execution completes without any run-time violation, the ASM reaches: tasko = 
(finished); this means that main’s body has been executed successfully. 

Fields. The value of a class field access is the value bound under the name of 
the class and of the field in the global environment (§ 15.10). 

if task is {class, field) A initialized) cZass) then (CFieldAcc) 

val{task) ~ glo {class, field) 
proceed 

The value of a class field assignment is the value of its right-hand side 
(§ 15.25). The execution of the assignment replaces the existing value bound 
under the class and field’s name in the global environment by the result of the 
right hand side. 

if task is {{class , field) = exp) A initialized) c/ass) then (CFieldAss) 

glo {class, field) := val{exp) 
val{task) := val{exp) 

proceed 



Methods. A class method invocation is used to call a class method (§ 15.11). 
The value of a method invocation is the return value of the invoked method — 
this is specified in Result and Return. Through method invocation new bindings 
are created in the environment, containing the bindings of the actual argument 
values to the methods parameters. The execution begins at the first phrase of the 
invoked method’s body. Formal arguments and the method’s body are looked 
up in the environment. 

if task is {{class, method, fcty){exp\, . . . , expn))/\ (CMethod) 

initialized) cZass) then 

frames := invoke ((j;a/(ea;pi), . . . , val{expn)) , args,fst {body), frames) 
where {args, body) = classMethod{class, method, fcty) 

The macro ‘invoke’, defined below, pushes the new phrase to be executed on the 
method call stack, an everywhere undefined function on the temporary stack 
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Fig. 6 Normal control flow of return / throw 



let stm = return exp\ in let stm — throw exp; in 
fst{stm) = fst(exp) fst{stm) = fst(exp) 

nxt{exp) = stm nxt(exp) = stm 



and the new bindings on the environment stack, respectively. (Concatenation of 
sequences is denoted by juxtaposition, for example (1) (2) = (1,2)). 

invoke((«;ah, . . . , vain), (van, . . . , van) , phrase, (tasks, vals. Iocs)) = 

((phrase) tasks, (0) vals, ({(vari, vah), ... ,(var„, vain)}) Iocs) 

Methods return when a return statement in their respective bodies is en- 
countered. 

A return expression statement is executed by first evaluating its subex- 
pression. (The control flow aspect of this execution is shown in Fig.|) If the 
expression evaluation completes normally, the return statement abrupts pro- 
cessing the method’s body, and attempts to transfer control and the value of the 
expression to the invoker of the method. This is achieved in several steps: First, 
the return statement completes abruptly, the reason being a Result with the 
value of the subexpression. 

if tfflsA: is return exp; then (Result) 

mode := Result(val(exp)) 
abrupt 

If there are still enclosing statements, they too have to be abrupted. For Javac 
the only enclosing statement kind pointed to by up is a labeled statement and 
it is easy to see that (by execution of LabStm) the labeled statement abrupts, 
since the reason for abruption is a Result. 

If there is no enclosing statement any more, i.e. when task becomes finished, 
execution of the return statement transfers the value of the expression to the 
invoker (provided there is still one), deletes the topmost bindings, and continues 
processing normally at the invoker’s next phrase (§ 14.15). In case there is no 
invoker anymore — main has terminated — the ASM run finishes. 

if task is finished A mode = Result (res) A length(tasko) > 1 then (Result’) 
mode := undef 
frames := result (res, frames) 

The modification of the task, temporary and local environment stack is sum- 
marized in the macro ‘result’, see below. 

A return statement without an expression has the same semantics as the 
return expression statement, except that no expression needs to be evaluated 
and consequently no value needs to be transfered from the invoked method to 
the invoker (§ 14.15). 
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Fig. 7 Normal control flow of a class initializer 



let mit = static block endstaticin 

fst{init) = static block nxt(static block) = fst (block) 

nxt(block) — endstatic nxt(endstatic) = finished 



if task is return; then (Return) 

mode := Return 
abrupt 

if task is finished A mode = Return A length(tasko) > 1 then (Return’) 

mode := undef 
frames := return(frames) 

Result and Return use macros to determine the next phrase to execute, to 
transfer the expression’s value to the invoker (only ‘result’), and to delete the 
topmost bindings. 

result(res, ((_, mt) tasks val) vals,{J) Iocs)) = 

{{nxt(inv)) tasks , (val (B {{inv , res)}) vals,locs) 

return((_, inr) tasks, {_) vals,{^} Iocs) = 

{{nxt(inv)} tasks , vals , Iocs) 

The anonymous variable stands for values that don’t care, ‘result’ uses the 
operator 0 to override the invoker’s val function at the phrase of the invocation. 
(An expression / © {(k, ?;)} is still a function, which returns the same values as 
/ everywhere except at the argument k, where it returns the value v.) 

3.3 Initialization 

Execution starts in a state in which no class is initialized. A class or interface 
will be initialized at its first active use by executing its static initializer. Before 
a class is initialized its superclasses must be initialized. The superinterfaces of 
an interface need not be initialized before the interface is initialized (see below) . 
This leads to three rules for class initialization: FirstActiveUse invokes the class 
initializer at the first active use of a class. Static starts the execution of the class 
initializer code or invokes the initialization of the direct superclass, Endstatic 
terminates a static initialization block. (The control flow is shown in Fig.|) 
The first active use of a class or interface T can occur if a class method 
declared in T is invoked, or if a (non-constant) static held declared in T is used 
or assigned. (Sect. defines the case if an instance for T is created.) The rule 

if (task is (class , field) V task is (class, field) = exp\/ (FirstActiveUse) 

task is ((class, method, fcty)(exp\, . . . , exp„)))A 
^initialized( c/ass) then 
frames := initialize( cZass, frames) 
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uses the macro ‘initialize’, which abbreviates (method) invocation: 

initialize(class,/rames) = invoke((), {}, fst{classlnit{class)) , frames) 

Note that by definition calling main, where main = (startClass , _), is a first 

active use of startClass . Thus we have to trigger the initialization of startClass 
before we process the first phrase of main’s body. We set frames = 
initialize( start (7Zass , start (/st ( body ) ) ) . 

The execution of a class initializer records that starting from now, initializa- 
tion of this class is InProgress. 

if task is static block then (Static) 

mit(currClass) := InProgress 
enter 

If the class represents (really) a class rather than an interface, the macro ‘enter’ 
invokes the class initialization of its direct superclass (if any) , provided it is not 
initialized; otherwise execution enters the computation of the static block. 

enter = if snpers (currClass) 7 ^ 0 A ^initialized(snper(currClass)) then 
frames := initialize(swper(currClass), frames) 
else task -.= fst(block) 

The macro ‘currClass’ always returns the class which contains the given phrase 
using the static function classScope : Phrase Class. 

currClass = classScope(task) 

But what happens if the current class is actually an interface rather than 
a class? The LRM says that the superinterfaces may be initialized. However, 
if the initialization of any of the superinterfaces has a side-effect or fails (see 
SectH this leads (at least on different machines) to nondeterministic behaviour 
contradicting Java’s design goals. Therefore we restrained from modelling this 
nondeterminism and rather present the solution of Sun’s Java Development Kit 
(JDK), which does not initialize superinterfaces. 

After having executed the static phrase, its block is executed; except that 
return statements cannot be part of the block, there are no restrictions on the 
kinds of possible statements or expressions. Note however, that when a static 
block of class T is executed, accesses to T’s fields and invocations of T’s methods 
must be possible without triggering a new first use of T . This is the reason why 
we included InProgress in the predicate initialized. 

When the phrase endstatic is executed, it records that the initialization is 
Done, and returns to the invoker. 

if tasA: is endstatic then (Endstatic) 

mit (currClass) Done 
exit 

The used macro ‘exit’ distinguishes whether this initializer was called implicitly 
(by a first use) or explicitly (during processing of a static phrase). In case the 
execution of the class initializer was not triggered by a first use, processing has 
to resume at the next task of the invoker, otherwise it has to go back to the last 
task of the invoker. 
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exit = if invoker = static block then 
frames := return(frames) 
else frames := goBack(frames) 
where invoker) _ = tasko 

To pop the task, temporaries and local variables from their respective stacks, we 
use the macro 



goBack((_) tasks, {_) vals,{^) Iocs) = {tasks , vals , Iocs) 

On return from an implicitly invoked static initializer, task still points to the 
same phrase, but this time the class is initialized and processing can continue 
normally. 



4 Adding Objects 

Javao, the ASM for the object-oriented sublanguage of Java, extends the object- 
based Javac for an object-oriented language. Java© supports instance fields and 
instance methods, instance creation and method overriding, type casts and null 
pointers. In contrast to Javac one may say that Java© is object-oriented, since 
its defining equation '"module = holds. 



4.1 Signature 

The abstract syntax of the object-oriented sublanguage of Java is given in Fig.J 
where [new Class] denotes the optional appearence of the subphrase new Class. 
The abstract syntax makes use of constructor specifications, which denote class 
constructors (whose function is to initialize the instance fields during instance 
creation) : 

ConstrSpec = Class x Type* 

Instance method invocations have an additional invocation Kind, which is 
used for method lookup. 

Kind ::= 'Virtual \ Nonvirtual \ Super 

Additionally, we extend the universe Type of Javax in Java© to include classes 
and interfaces and the type null. 

As a result of the parsing and elaboration phase the following holds: Field 
access using super is reduced to “ordinary” field access of the superclass’ field. 
Method invocations (including their invocation kinds) are attributed as specified 
in § 15.11.1-3. (The invocation kind Static is not needed here, since it is already 
handled by class methods. We do not introduce the invocation mode Interface, 
because it is semantically equivalent to Virtual.) Any constructor body — except 
the body of the constructor of the class Object — either begins with an explicit 
constructor invocation of another constructor in the same class, or with an ex- 
plicit invocation of a superclass constructor (§ 12.5). Every execution path of 
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Fig. 8 Abstract Java Syntax for Javao 



Exp ::= ... 

I [new Class] ConstrSpec{Exp, . . . , Exp) 

I this I Exp.FieldSpec \ Exp.FieldSpec = Exp 
I Exp.MethodSpec{Kind}(Exp, . . . , Exp) 

I Exp instanceof Class \ (Class)Exp 



any constructor or instance method body ends with a return statement. In- 
stance field initializers of class T are syntactically reduced to assignments. They 
are replicated in all constructors for class T, which call a superclass constructor. 
The assignments immediately follow the call of the superclass constructor, which 
guarantees that instance field initializers are evaluated only once per instance 
creation. 

The following static functions look up compile-time information in the envi- 
ronment: 



instFields : 
instFieldValue : 
instConstr : 
instMethod : 
compatible : 



Class V FieldSpec 

FieldSpec Value 
ConstrSpec Var* x Block 
MethodSpec x Class x Kind 
Class X Class Bool 



Var* X Block 



The function instFields calculates the set of instance fields declared by the 
specified class and all of its superclasses (if any). The function instFieldValue 
maps fields to their default values (as specified in the LRM § 4.5.4). The function 
instConstr looks up the required constructor. The function instMethod starting 
at a particular class, returns the (first) method declaration for the given method 
specification. If the invocation mode is Nonvirtual, overriding is not allowed; the 
specified method in the explicitly given class is the one to be invoked. Otherwise 
the invocation mode is Virtual or Super and overriding may occur. If a method 
with the given method specification is not implemented in the given class, the 
superclass of that class is then recursively searched; whatever it comes up with 
is the result of the search (§ 15.11.4). The predicate compatible{src, tar) returns 
true iff the reference type src is assignment compatible — defined according to 
LRM § 5.2 — with reference type tar. For instance, if both src and tar are classes, 
src must be equal to tar or src must be a subclass of tar. 

The values of reference types are references to instances of classes. References 
belong to the abstract dynamic universe 



Reference. 
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We extend the universe Value in Javao to include the values of Java’s primitive 
types, references and the value null. 

Value ::= . . . | Reference \ {null} 

In Java© any class has its Class object. Since we ignore linking and loading 
in this paper, the function classRef, which maps any class to its class object, is 
not dynamic but static. 

classRef : Class —> Reference 



To model the dynamic state of instances, we have to reserve storage for 
all instance variables and have to store to which class an object belongs. The 
function classOf returns the class of the object that is refered to by the reference. 
The dynamic function dyn returns the value stored under a field specification of 
an object. 



classOf : Reference Class 

dyn : Reference x FieldSpec Value 



4.2 Transition Rules 

The initial state and the termination conditions of Javao are taken from Java^. 
Additionally, we require that in Javao ’s initial state classOf and classRef are 
inverses of each other. Javao has all the rules of Javac and in addition the rules 
below for the new object-oriented features. 



Instance creation. In Javao new class instances are explicitly created by 
evaluating a class instance creation expression. The value of this expression is a 
reference to the newly created object of the specified class type. The new object 
contains new instances of all the fields declared in the specified class and its 
superclasses (§ 15.8). The creation of a new instance proceeds as follows. 

First, the term new class (which is the first subphrase of the instance creation 
expression) is evaluated. If it is the first active use of the class, the class initializer 
is invoked. 

if task is new class A -iinitialized( c/oss) then (IFirstActiveUse) 

frames := initialize) c/oss, frames) 

If the class is initialized, we generate a new instance: 

if task is new class A initialized) cZass) then (Newinstance) 

newinst ance) cZoss , 
val(task) := ref) 
proceed 

The macro ‘newinstance’ allocates a new reference, keeps track of its origin, sets 
each new field to its default value, and executes the updates. 
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newInstance(c/oss, updates) = extend Reference by ref 

classOf{ref) := class 
vary / over instFields (class) 
dyn(ref,f) := instFieldValue(f) 
updates 

Then the argument expressions are evaluated (if any). When the arguments 
are evaluated the constructor is called. New bindings are created in the local 
environment: the formal arguments vari are bound to the actual ones val(expi); if 
the constructor is part of a new instance creation expression, this is bound to the 
newly generated reference, which is already assigned to val(new); otherwise — 
the constructor is called explicitly — the value of this can be looked up in the 
local environment. 

if task is new constrSpec(expi, . . . , exp„) then (Constr) 

frames invoke((t/i*s) vals , (this) args, fst (body), ft ames) 
where (args, body) = instConstr(constrSpec) 

this = if new = new class then val(new) else loc(this) 
vals = (val(expi), . . . , val(expn)} 



Fields and this. The value of a field access expression is the value of the field 
in the object pointed to by the target reference (§ 15.10), provided it is not null. 

if task is (exp.fieldSpec) A val(exp) null then (IFieldAcc) 

val(task) := dyn(val(exp) ,fieldSpec) 
proceed 

The value of a field assignment expression is the value of its right-hand side 
(§ 15.25) - provided the target reference is defined. Execution of the assignment 
redefines the value of the field in the object pointed to by the target reference 
with the value of the right-hand side. 

if task is (expi.fieldSpec = exp2) A val(expi) 7 ^ null then (IFieldAss) 

dyn(val(expi) , fieldSpec) := val(exp2) 
val(task) := val(exp2) 
proceed 

The value of the keyword this is a reference of the object for which the 
instance method was invoked (§ 15.7.2). This reference is bound at each method 
invocation and can therefore be used at the current calling depth. 

if task is this then (This) 

val(task) := /oc(this) 
proceed 



Methods. An instance method invocation expression is used to invoke an in- 
stance method (§ 15.11). The value of a method invocation expression is the 
return value of the invoked method — this is specified in Result. Provided the 
target reference is defined, we have to distinguish three cases to locate the in- 
voked method: If the invocation mode is 
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— Nonvirtual it denotes a private method. The instance method is looked up 
statically, searching its declaration at the current class. (Context restrictions 
guarantee that the method specification’s class and the explicitly given class 
agree.) 

— Virtual, then the instance method is looked up dynamically, searching its 
current definition starting at the class of the object. 

— Super, then the instance method is looked up dynamically, too. However, 
here the search for the method’s definition starts at the immediate superclass 
of the current class. 

Through method invocation new bindings are created in the environment, con- 
taining the bindings of the actual argument values to the methods parameters, 
and the target reference available as this. The execution begins at the first 
phrase of the invoked method’s body. 

if task is (exp.methodSpec{kind}(expi, . . . , exp„))A (IMethod) 

val(exp) 7 ^ null then 

frames invoke ((ria/( exp)) xaZs, (this) args,fst (body), frames) 

where {args, body) = instMethod{methodSpec, class, kind) 
vals = {val{expi), . . . , val(expn)) 
class = case kind of Nonvirtual : currClass 

Virtual : classOf{val(cxp)) 

Super : SMper(currClass) 



Dynamic typing. The value of an instanceof expression is true, if the value 
of its operand is not null and the reference is compatible with the required type. 
Otherwise the result is false (§ 15.19.2). 

if task is {exp instanceof class) then (Instanceof) 

val(task) := val{exp) nullA 

compatible ( classOf{ val ( exp)) , class) 

proceed 

The value of a reference type east expression is the value of its operand - 
provided it is compatible with the required class or interface type or it is null 
(§ 15.15). 

if task is {class)expA (Cast) 

val(exp) = null V compatible{classOf{val{exp)), class) then 
val(task) := val(exp) 
proceed 

4.3 Arrays and Strings 

Most of the object-oriented concepts introduced in the previous subsection apply 
equally well to arrays and strings; so we do not extend Java<p but rather sketch 
necessary extensions. 

Java’s arrays are objects. Therefore, we can use the previously introduced 
function classOf to store the array’s type and use the previously introduced dyn 
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function to also model the dynamic state of array components. However, since 
components are accessed by natural numbers, we have to refine FieldSpec to also 
include natural numbers. Every array also has an associated Class object, shared 
with all other arrays with the same component type. So we also have to refine 
the function classRef and the initialization of classOf . In the context of class 
initialization an ambiguity arises. Whereas the LRM specifies that in an array 
creation expression the array’s element type — provided it is a class or interface — 
is initialized, this is left out in Java’s Virtual Machine specification although 
this is required as part of the resolution process. In fact. Sun’s JDK triggers the 
initialization of the array’s element type in array creation expressions. 

Java strings are unusual, in that the language treats them almost as if they 
were primitive types supporting literals; instead they are instances of the Java 
String class. Thus, strings are objects and we could model them accordingly, 
namely like arrays. However, string literals always refer to the same instance of 
class String — these string literals are interned, so as to share unique instances. 
This is in contrast to strings which are concatenated at run-time. To distinguish 
both kinds of strings we need a dynamic function interned which always holds 
the set of references of interned strings. Furthermore, we have to modify the 
initialization of Java©, because strings can be assigned to constant fields. We 
refine the static function classFieldValue, so as to return not only primitive 
values but also (constant) strings. The initialization of dyn must be refined 
accordingly, i.e. if classFieldValue maps a field to a string, the string must be 
stored. 

5 Adding Exceptions 

Javaf extends Java© with exceptions. We take particular care that our refine- 
ment of Javao by exceptions makes it transparent how break and continue 
statements (of Javai), return statements and the initialization of classes and 
interfaces (in Java©) interact with catching and handling exceptions. Exception 
handling is a means of recovering from abnormal situations. Java’s exceptions 
are represented by instances of class Throwable. Java distinguishes between run- 
time exceptions (which correspond to invalid operations violating the semantic 
constraints of Java), errors (which are failures detected by the executing ma- 
chine) and user-defined exceptions. We consider here only run-time and user- 
defined exceptions, because, errors are considered as belonging to the JVM and 
are therefore ignored in the dynamic semantics. 

5.1 Signature 

The abstract syntax as presented in Fig. H defines the extension of Java© by 
Java’s exception handling constructs. 

When an exception is thrown processing completes abruptly. To model this 
we extend the universe Reason with the reason Throw embedding the particular 
exception of type Reference. 
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Fig. 9 Abstract Java Syntax for Java^ 



Stm ... 

I throw Exp 

I try Block catch (Class Var) Block . . . catch (Class Var) Block 
I try B/ocA: finally B/ocA: endfinally 



Reason ::= ... | Throw (Reference) 

Exceptions propagate through the grammatical block structure of a Java 
method and then up the method call stack to the nearest dynamically enclosing 
catch clause of a try-catch statement that handles the exception. A catch 
clause handles an exception if the exception object is compatible with the de- 
clared type. 

In situations where it is desirable to ensure that after one block of code 
another one is always executed, Java provides the try-f inally statement. The 
finally clause is generally used to clean-up after the try clause. It is executed 
if any portion of the try block — regardless how it completes — is executed. In the 
normal case control reaches the end of the try block and then proceeds to the 
finally block. If control leaves the try block abruptly, the code of the finally 
block is executed before control transfers to the ‘intended’ interrupt destination. 

5.2 Transition Rules 

We assume that Java^ is initialized like Javao and execution starts normally. 
If all iterations and recursions of the given program terminate, Javan’s final 
state is tasko = (finished). Then, if mode is Return execution has completed 
normally, otherwise mode denotes the thrown exception, which is not caught by 
the program. 

Throwing exceptions. User-defined exceptions are thrown explicitly, using 
throw statements. Run-time exceptions are thrown, if certain semantic con- 
straints for binary operations, target expressions and reference type cast ex- 
pressions do not hold. 

A throw statement is executed by first evaluating the expression. (Figure J 
captures the definition of the normal control flow.) If this evaluation completes 
abruptly, the throw statement completes abruptly, the reason being the same as 
the abrupt completion of the expression. Otherwise, if the value of the expression 
is null, a NullPointerException is thrown. In case the exception is not null, 
the intended exception is thrown: the control flow is abrupted the reason being 
a Throw with the value of the subexpression (§ 14.16). 
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Fig. 10 Normal control flow of try-catch 



let stm = try block catch (. . ,)blocko . . . catch (. . ,)blockn in 
fst{stm) = fst(block) 

nxt(block) = nxt{blocki) = nxt{stm),0 < i < n 
up(block) — catch (. . ,)blocko ■ . . catch (. . ,)blockn 
up(blocki) = up{stm),0 < i < n 



if tfflsA: is throw exp; then (Throw) 

if val(exp) 7 ^ null then 

mode := Throw{val{exp)) 
abrupt 

else fail(NullPointerException) 

The macro ‘fail’ allocates an exception object, throws the exception and 
abrupts (which starts the execution of the corresponding finally code, if there 
is some, and the search for the appropriate exception handler; see the following 
subsection on propagating and handling of exceptions) . 

fail(class) = newlnstance( class, mode := Throw(ref) 

abrupt) 

This definition left out to call a class constructor. This is correct, as long as 
constructors only call superclass constructors. Otherwise ‘fail’ must be defined 
to be equivalent to execute the following code: throw new class{class, e)(); 

A binary expression throws an ArithmeticException, if the operator is an 
integer division or remainder operator and the right operand is 0 (§ 15.13, 15.14). 
The following rule refines Binary Exp of Sect.H 

if task is (expi ® exp 2 ) A ® G {/, %} A val{exp 2 ) = 0 then (BinaryExp) 

fail( ArithmeticException) 

An instance target expression throws a NullPointerException, if the 
operand is null. The following rule refines IFieldAcc, IFieldAss and IMethod 
of Sect.H 

if {task is exp.fieldspeeM task is exp.fieldspec = exp^M (ITarget) 

task is exp.methodspec{kind}{expi, . . . , exp„))A 
val(exp) = null then 
fail(NullPointerException) 

A reference type cast expression throws a ClassCastException, if the value 
of the immediately contained expression is neither null nor compatible with the 
required type (§ 15.15). The following rule refines Cast of Sect.H 

if task is (type)expA (Cast) 

val(exp) ^ null A -^compatible{classOf{val{exp)), class) then 
fail(ClassCastException) 
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Propagating and handling exceptions. A try-catch statement is executed 
by executing first the try clause. If execution of the try clause completes nor- 
mally, then this completes the try-catch statement normally. If in the try clause 
an exception is thrown, it is checked whether there is a catch clause, which can 
handle this exception. If the exception object is not compatible with any of the 
declared types, the exception propagates to the next higher enclosing block of 
code. Otherwise (the exception object is compatible with at least one declared 
type) the first (leftmost) compatible catch clause is selected, the exception is 
bound to the exception handler parameter and execution continues normally 
by executing the block of the selected catch clause. If processing of the latter 
abrupts, this abrupts the try-catch statement. 

Figure^Jshows the control flow in normal mode. In addition it defines the 
function up, which is used when processing completes abruptly. 

We need no rule for the try block, because the control flow abstracts from 
this syntactic construct. For the selection of the compatible catch clause we use 
the following transition rule, which uses the descriptive operator l, to determine 
that catch clause which fulfills the given predicate. 

if task is ( catch (cq vq) bo . . . catch (c„ Vn) bn) then (Catch) 

if mode = Throw(exc) A3 i ■. 0 < i < n ■. catches{ci) then 
loc(vk) ’■= exc 
mode := undef 
task := fst{bk) 

where k — ci ■. 0 < i < n catches{ci)A 

Vj : 0 < j < j : -<catehes{cj) 

else abrupt 

where catches {class) = compatible{classOf{exc), class) 

If the thrown exception is compatible with any of the clauses. Catch assigns 
the exception object to the exception handler parameter, resets the processing 
mode, and proceeds normally. 

If an exception is not caught by the block of code that throws it, it propagates 
to the next outer enclosing block of code. If an exception is not caught anywhere 
in the method, (i.e. task becomes finished) it cleans the bindings and the 
temporaries of this method call, and returns to the invoking method, where it 
again propagates through the block structure. 

if task is finishedA mode = Throw(exc) A length(tasko) > 1 then (Throw’) 
frames := throw(frames) 

To pop the task, temporaries and local variables from their respective stacks 
and to restart the search for an exception handler in the invoking method the 
rule uses the macro 

throw((_, inj;) tasks, {^) vals,{_) Iocs) = {{up{inv)) tasks , vals , Iocs) 

If an exception is never caught, it propagates all the way up to main’s body, 
where in Java^ processing gets stuck. This will be refined by exceptional thread 
termination in the next section. 
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Fig. 11 Normal control flow of try-f inally/ synchronized 



let stm = try blocks finally blocks 
endf inally in 
fst{stm) = fst(blocko) 

nxt(blocko) = up(blocko) 

= finally blocks 
nxt(blocki) = up(blocki) 

— endf inally 

nrt(endf inally) = nxt(stm) 
Mp(endf inally) = up{stm) 



let stm = synchronized {exp) block 
endsynchronized in 
fst{stm) = fst{exp) 
nxt(exp) — synchronized (exp) block 
nxt(block) = up(block) 

— endsynchronized 
nxt (endsynchronized) = nxt{stm) 
up{exp) = wp (endsynchronized) 

= up{stm) 



Handling clean-up code. A try-f inally statement is executed by executing 
the try clause. Independently of whether the try clause completes normally or 
abruptly, the finally clause is executed, see Fig. for the definition of the 
control flow functions /st, nxt and up. To restore thrown exceptions — when the 
finally clause is left — we introduce an initially empty dynamic function 

finally : Mode* 

to store the current execution mode when the finally clause is entered. 

if tasA: is finally block then (Finally) 

finally := (mode) finally 
mode := undef 
task := fst(block) 

If this finally clause completes — task points to endf inally — there is a 
choice: If the finally clause was entered in normal mode, the execution of the 
try-f inally statement proceeds normally. If the finally clause was entered 
in abrupted mode, the execution abrupts again with the same reason given by 
that abrupted mode. However, if a finally clause itself completes abruptly, 
the pending control transfer is abandoned and this new transfer is processed 
(§ 14.18). This is expressed by the transition rule for endf inally. 

if tas/c is endf inally then (Endfinally) 

finally \— finally' 

if mode = undef A mode' = undef then proceed 
elseif mode = undef A mode' 7^ undef then mode := mode' 

abrupt 
else abrupt 

where {mode') finally' — finally 

We invite the reader to check that these rules correctly formalize the description 
of the LRM for handling exceptions and clean-up (§ 14.18). 

Initialization of classes and interfaces. Java is a robust language. So we 
also have to care about uncaught exceptions in static initializers. For them Java 
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specifies the following strategy: If the current class is in an erroneous state, then 
initialization is not possible; throw a NoClassDef FoundError. If during execu- 
tion of the body of the static initializer an exception is thrown and this is not 
an Error or one of its subclasses, then throw ExceptionInInitializerError. 
(With respect to the subclassing of ExceptionInInitializerError, the LRM 
is ambiguous. Whereas § 11.5.2 defines it to be a subclass of Error, § 20.23 de- 
fines it to be a subclass of RuntimeException, which is the correct definition, 
since it signals an uncaught exception during execution and not during loading, 
linking or preparing.) 

We extend the universe InitState by a new element, signalling that a class is 
in an erroneous state. 



InitState ::=... | Error 

The following transition rule extends Static of Sect. ^3 to whose guard we 
add the condition m«t(currClass) = undef . Furthermore, we assume the correct 
extension of the function up, namely by Mp(static block) = finished. 

if task is static block A mit(currClass) = Error then (Static) 

fail(NoClassDef FoundError) 

We have no extra transition rules for the static block. Instead we assume 
that the static block acts like the try block of a try-catch statement, where 
the endstatic phrase plays the role of the catch clause. This induces the 
correct extension of the up function for this case, formally let init = static 
block endstatic, then up(block) =endstatic and rtp(endstatic) = finished. 

According to the LRM, the rule for endstatic simply (re)throws the excep- 
tion. 



if tosA: is endstatic A mode = Throw (ref) then (Endstatic) 

mit(currClass) := Error 
if compatible{classOf (ref), Error) then 
frames := throw(frames) 
else fail(ExceptionlnlnitializerError) 

This rule extends Endstatic of Sect.^Hto whose guard we add the condition 
mode = undef. 



6 Adding Threads 

The preceding models are concerned with the behavior of Java executing a sin- 
gle phrase at a time, that is by a single thread. In this section we extend Java^ 
to Javar, the model for multithreaded Java, which provides support for execu- 
tion of many different tasks working on shared main and local working memory. 
We consider Java’s thread creation and destruction, its mechanisms for synchro- 
nizing the concurrent activity of threads using locks, and Java’s waiting and 
notification mechanism for efficient transfer of control between threads. 
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Fig. 12 Abstract Java Syntax for Java^ 



Stm ... I synchronized (Fa:p) B/ocA: endsynchronized 

Exp ::= ... I Exp. startQ \ Exp. stop{) \ Exp. wait () | Exp. notify () 



The reference manual specifies a memory model for shared memory multi- 
processors that support high performance implementations. It allows objects to 
reside in main and local working memory and presents rules (formalized as a 
particular event-structure by Cenciarelli et al. Q) specifying when a thread 
is permitted or required to transfer the contents of its working copy of an in- 
stance variable into the master copy in main memory or vice versa. In order to 
separate this memory model — which “details the low-level actions that may be 
used to explain the interaction of Java Virtual Machine threads with a shared 
memory” ^3 page 371] — from the semantics of the mechanisms defined by the 
language for thread creation, destruction, synchronization and for waiting and 
notification, we will first build a model for these mechanisms which uses only the 
main memory for storing objects and which agrees for best practice programs 
with the LRM memory model. 

“Best practice is that if a variable is ever to be assigned by one thread 
and used or assigned by another, then all accesses to that variable should 
be enclosed in synchronized statements.” § 17.13. 

In Sect. ^3 we define another model, which supports local working memo- 
ries and uses them as much as possible. The meaning of programs running on 
these two extreme memory models — both of which are in accordance with the 
semantics described in the LRM — agrees for best practice programs. 

6.1 Signature 

The abstract syntax of Javar is given in Fig. ^3 where — for ease of reading — 
invocation kinds are suppressed. 

Threads are concurrent independent processes running within a single pro- 
gram so that they correspond to code executing agents in distributed ASMs £3. 
For the modeling of threads we therefore use a universe 

Thread 

to formalize the objects belonging to the class Thread through which threads 
in Java are represented and controlled. Since threads are objects, the universe 
Thread is a subset of Reference. Every thread has its own state, consisting of 
its tasko, loco, valo stacks and its execution mode, represented by the variables 
mode and finally] it is impossible for one thread to access parameters or local 
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variables or the execution mode of another thread. Correspondingly we parame- 
terize each of these dynamic functions with its agent, i.e., we obtain the following 
signatures of these functions in JavaT: 

taskr ■ Thread Phrase* 

valr ■ Thread {Exp Value)* 

her '■ Thread {Var Value)* 

moder ■ Thread Reason 

finallyr ■ Thread Reason* 

During execution of a single thread the agent, denoted in Javag by the logical 
function self (which supports the self-identification of agents), does not change. 
A consequence is that we can define our former functions as abbreviations of the 
refined ones. We have: 



tasko = taskr{self) 
valo = valr{self) 
loeo = loer{self) 
mode = moder{self) 
finally = finallyriself) 

Through these abbreviations the rules, macros and propositions — where indi- 
cated with some further refinement — carry over from Java^ to Java^. 

Threads exchange information among each other by operating on objects 
residing in shared main memory, which is modeled by the functions glo, dyn and 
classOf. 

To synchronize threads Java uses monitors, a mechanism for allowing only 
one thread at a time to execute a region of code protected by the monitor. The 
behavior of monitors is explained in terms of locks uniquely associated with 
objects. When a synchronized statement is processed, the executing thread 
must grab the lock associated with the target reference to become the owner 
of the lock before the thread can continue. Upon completion of the block the 
mechanism releases that very same lock. We use a dynamic function owns to 
keep track of the dynamic nestings of synchronized statements; owns{thread) 
denotes the stack of all references grabbed by thread. Since a single thread can 
hold a lock more than once we have to define dynamic lock counters. 

owns : Thread Referenee* 
locks : Reference Nat 

To assist communication between threads, each object also has an associated 
wait set, a set of threads. Wait sets are used by the statements wait and notify. 
The wait method allows a thread to wait for a notification condition. Execut- 
ing wait adds the current thread to the wait set for this object and releases 
the lock — which is reacquired to continue processing after the thread has been 
notified by another thread. Wait sets are modeled by the dynamic function 
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waitSet : Reference —> V Thread. 

Every thread can be in one of five states. This is modeled using the dynamic 
function: 



exec : Thread ThreadState 

where the universe ThreadState is defined by 

ThreadState Runnable \ Blocked \ Notified \ Exiting. 

A thread T is in the initial state {exec{T) = undef), from the period when 
it is created until the start method of the Thread object is called whereby it 
becomes Runnable. A thread that is in the Blocked state is one that cannot be 
run because a wait method has been called. A thread is in the Notified state 
once the notify method has been called for it. A thread is in the Exiting state 
once its run method has terminated or its stop method has been called. 

The stop method is an asynchronous method. It may be invoked by one 
thread, to affect another thread in its current point of execution. We use a 
dynamic function 

stopped : Thread — > { Yes} 

to signal to a thread that it has been stopped. We strengthen the macro task is 
phrase for Javar as follows: 

task is phrase = top{taskr{self)) = phraseA 
stopped(self) = undef A 
exec{self) £ {Runnable, Notified, Exiting} 

SO that a thread is only allowed to execute a phrase if the thread is neither 
stopped nor blocked. 

The language reference manual leaves the scheduling strategy open. Although 
the language designers had a pre-emptive priority-based scheduler in mind, they 
explicitly say that there is no guarantee that threads with highest priority will 
always be running. Therefore, we abstract from priority based scheduling and 
use a ‘loose’ scheduling strategy. This means that we make the executability of 
a Javar-rule, in addition to its being guarded by task is phrase, depend only 
on the partial order conditions for distributed ASM runs, leaving the further 
specification for any particular scheduling open. 

6.2 Transition Rules 

The initial state of Javar is like the one for Javaf. The run starts with a single 
thread, called main, i.e., {main} = Thread. The thread main starts in normal 
mode, the state of main is Runnable, and main’s task, temporary and local 
stacks are initialized as discussed in Sect.^H The remaining newly introduced 
dynamic functions are undefined. 

Execution continues until all threads are blocked or have exited, i.e. the run 
of the distributed Javar ASM terminates, if no agent can execute any rule any 



more. 
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Thread creation. There are two ways to create a new thread. One way is to 
declare a class to be a subclass of Thread; this subclass should override the run 
method of class Thread — the run method is the body of the thread. Creating 
an instance of this class by Newinstance creates a new thread T, which is (by 
default) in its initial state (so that in particular exec{T) = undef, stopped(T) = 
undef, task-r(T) = undef, etc.). Since threads are modeled as agents of the 
distributed ASM Javar, by definition each newly created thread T gets the 
rules of Javar assigned for execution with T = self. 

Executing a thread. The start method of class Thread is used to cause a 
thread (provided it is not null and it is started for the first time) to begin exe- 
cution by calling the run method of its Thread object (§ 20.20.14). If the thread 
to be started has already been started, an IllegalThreadStateException is 
thrown. 

if task is exp.start() A val{exp) 7 ^ null then (Start) 

if exec{val{exp)) = undef then 

newFrames := sta,Tt{fst{body)) 
exec{val(exp)) ~ Runnable 
proceed 

else fail(lllegalThreadStateException) 

where ((), body) = instMethod [run, classOf{val{exp)) , Virtual) 

newFrames = {taskr(val{exp)),valr{val{exp)),locr{val{exp))) 

A thread runs until the run method has nothing else to do or its stop method 
is called (see below) . A thread terminates also if it could not handle an exception 
which had occurred (and after having executed all relevant finally clauses); in 
this case the uncaught exception method for the parent thread group is invoked. 
(We have not specified the latter) . The transition rule for the normal termination 
is as follows: 

if task is finished A length(tasko) = 1 A exec{self) — Runnable then (Terminate) 
exec(self) := Exiting 

Thread synchronization. A synchronized statement is executed by first 
grabbing the lock of the object denoted by the target reference, provided it is 
not null. (If it is null a NullPointerException is thrown.) If the current thread 
already holds the lock, or if the lock is free and the current thread is the one 
chosen for execution, it grabs the lock, increments the lock counter and executes 
the block; this is summarized in the macro ‘lock’. 

if tasA: is synchronized (eap) 6 ZocA: then (Synchronize) 

if val(exp) null then 

lock{val{exp),task := fst(block)) 
else fail(NullPointerException) 

Upon completion of the block — either normally or abruptly — the 
endsynchronized phrase is executed. (Figure^Jdefines the control flow.) Exe- 
cuting endsynchronized releases the very same lock, namely the last in the se- 
quence of locks grabbed by the thread due to the correct nesting of synchronized 
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and endsynchronized phrases, and decrements the lock counter, see the macro 
‘unlock’. If execution has completed normally, processing continues normally. If 
execution was abrupted, the reason of abruption is propagated up the nesting 
level. 

if tasA: is endsynchronized then (Endsynchronize) 

unlock( top{owns{self)), 

if mode = undef then proceed 
else abrupt) 

The macro ‘lock’ tests whether the thread already holds a lock on the given 
reference. Whether the agent himself is the one chosen for locking the particular 
object, is captured by the macro ‘competing’, see below. If the thread already has 
or gets the lock the thread can enter the synchronized block; the lock counter 
is incremented and the grabbed reference is pushed onto the owns stack. The 
macro ‘unlock’ is an inverse of the macro ‘lock’, (elem G list tests whether elem 
is in the list). 



lock(re/, updates) = 

if ref e owns(self) V {loeks(ref) £ {0, undef} A eompeting{ref) = self) then 
owns{self) := {ref) owns{self) 
locks{ref) := loeks(ref) + 1 
updates 

unlock(re/, updates) = 

owns(self) := pop{owns{self)) 
loeks(ref) := loeks(ref) — 1 
updates 



The macro ‘competing’ uses a not further specified scheduling function 
chooscsync, to return an arbitrary thread out of a set of threads competing to 
get the lock for the given reference. A thread competes for a lock, if it executes a 
synchronized statement, a wait expression, or the phrase static or endstatic, 
provided the thread is in the appropriate state. We specify the macro ^competing' 
in tabular form (where classObj(t) abbreviates 

classRef{ class Scope {top{taskr{t)))))\ 

competing (re/) = ehoosesync{t £ Thread \ 

top{taskr{t)) = • A ref = ■ A state{t) £ • 



synchronized (erp) block 
exp. wait () 
static block 
endstatic 

} 



top{val{t)){exp) 

top{val{t)){exp) 

classObj(t) 

classObj(t) 



{Runnable, Exiting} 
{Notified} 
{Runnable , Exiting} 
{Runnable, Exiting} 



It is easy to show that upon leaving a synchronized statement, the previous 
state of the lock counters and the stack of grabbed references become exactly as 
they were when the statement was entered. 
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Thread notification. The wait method of class Object causes the current 
Runnable thread to wait until some other thread invokes the notify method for 
this object. This method can be called only when the current thread is already 
synchronized on this object, otherwise an IllegalMonitorStateException is 
thrown. Executing wait enables the thread from executing: it changes the state 
of the process from Runnable to Blocked, adds the current thread to the wait set 
for this object and releases the lock. The LRM does not specify what happens if 
already Exiting threads should wait. We decided that the thread should behave 
as if it were Runnable. (Other choices are conceivable and easily formalized 
changing our rules.) To remember that a thread is already exiting we introduce 
a new dynamic function 



backup : Thread ThreadState 

which is set when the current thread enters the wait expression and is used 
when it proceeds from there. 

if task is {exp. wait ()) A exec{self) € {Exiting, Runnable} A (Wait) 

val{exp) 7^ null then 
if val(exp) € owns(self) then 
backup{self) := exec(self) 
ermhle{val{exp) , self , Bloeked) 
else fail(lllegalMonitorStateException) 

To continue processing the thread first has to be notified by another thread. 

The notify method of class Object chooses one thread among those wait- 
ing on this object. The choice is left unspecified by the Java LRM; we reflect 
this by introducing yet another not furthermore specified choice function, say 
chooscnotify The chosen thread is then removed from the wait set and its state 
is changed from Blocked to Notified. We say the thread is awaked. The notify 
method may be called only when the current thread is already synchronized on 
this object, otherwise an IllegalMonitorStateException is thrown. 

if task is {exp. notify ()) A val{exp) 7^ null then (Notify) 

if val{exp) € owns{self) then 

if waitSet{val{exp)) 7^ 0 then 

awake(«;a/(ea:p), choose„oUSy{waitSet{val{exp))) , Notified) 
proceed 

else fail(l llegalMonit orSt ateExcept ion) 

The notified thread whose task still points to the wait method invocation 
then competes in the usual manner with other threads for the right to synchro- 
nize on the object. Once the reenabled thread has gained control of the object, 
all its synchronization claims are reacquired, its state switches to the state in 
which wait was entered (which is either Runnable or Exiting) and its execution 
continues normally. 

if task is {exp. wait ()) A exec{self) — Notified A val{exp) 7^ null then (Wait) 
reenable(t)aZ(ea:p), se//, baekup{self) , proceed) 
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The macros ‘enable’, ‘awake’ and ‘reenable’ show the relation between these 
three notification phases. {occs{list, elem) yields the number of occurrences of 
elem in the list). 

enable(re/, thread, state) = waitSet(ref) := waitSet(ref) U {thread} 

locks (ref) 0 

exec{thread) := state 

awake(re/, thread, state) = waitSet(ref) := waitSet(ref) — {thread} 

exec{thread) := state 

reenable(re/, thread, 

state, updates) = if locks(ref) € {0, undef} A competing (ref) = thread then 
locks(ref) := occs {owns (thread), ref) 
exec (thread) := state 
updates 

It is easy to show that upon return from the wait method, the synchroniza- 
tion state of the object of this thread returns exactly as it was when the wait 
method was invoked. 



Stopping a thread. The stop method of class Thread is an asynchronous 
method. It may be be invoked by one thread to throw the error ThreadDeath 
for another thread; the thread to be stopped is notified if it is waiting. It is 
permitted to stop threads in Initial as well as in Exiting mode. In the former 
case, if the thread is eventually started, it will immediately terminate. In the 
latter case, nothing happens. In the usual case that the exception is not caught, 
it propagates up to the run (or for the main thread up to the main) method of 
this thread (§ 20.20.15). 

We model this asynchronous behavior by two rules. During invocation of 
the stop method, we set the dynamic function stopped to Yes for the stopped 
thread, provided this thread is not already exiting. 

if task is erp.stop() A val(exp) 7 ^ null then (Stop) 

if exec(val(exp)) Exiting then 
stopped(val(exp)) := Yes 
proceed 

If a Runnable thread receives the stop signal, it changes its execution state to 
Exiting, resets the dynamic function stopped and throws the error ThreadDeath. 
Blocked threads first have to be awaked, and Notified threads first have to be 
reenabled, so that they can change their state and throw the exception. 

if stopped(self) then (Stopped) 

case exec(self) of 
Runnable : stop 

Blocked : a,wake(val(exp),self. Notified) 

Notified : reenah\e(val(exp),self, Exiting,stop) 
where exp. wait () = task 

The used macro ‘stop’ changes the thread’s state and raises the exception. 
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stop = stopped(self) := undef 
exec(self) := Exiting 
fail(ThreadDeath) 

Stopped threads can continue to work — the ThreadDeath exception may be 
caught and execution can proceed as if nothing had happened, except that (as 
is easy to show for Javar) the stopped thread cannot be stopped once more. 

The thread that receives the stop signal immediately reacts. For implemen- 
tations this might be rather expensive. So Java permits a small but bounded 
amount of execution to occur before an asynchronous exception is received. 
Java’s LRM (§ 11.3.2) gives only hints how big the ‘bounded amount’ might be. 
In our model it is matter of refining Stopped (and the macro ‘is’) by strength- 
ening (or weakening) the conditions to specify precisely when a stop signal is 
received. 

Initialization of classes and interfaces. Initialization of a class or interface in 
JavaT requires synchronization, since several threads may be trying to initialize 
the class at the same time. If initialization by one thread is InProgress, other 
threads have to wait until the initialization is Done or an Error occurs. To 
distinguish the thread that actually initializes a class from those that have to 
wait, we introduce a dynamic function 

initThread : Class Thread 

and we refine the predicate ‘initialized’, so that it is true, if either initialization is 
Done or, if initialization is InProgress the initializing thread must be the current 
thread. 

initialized) c/ass) = init(class) — Done V 

{initState (class) = InProgress f\ initThread (class) = self) 

The procedure of Java^ for initializing a class or interface must then be re- 
fined as follows. During processing the static phrase, first synchronize on the 
Class object. (This is captured by a strengthened ‘is’ predicate.) If initializa- 
tion has not yet started, record that the current thread initializes the class, set 
the initialization state to InProgress and either invoke the initialization for the 
superclass (if any) or enter the computation of the static block. 

if tasA: is static block A mit(currClass) = undef then (Static) 

initThread (class) := self 
init (currClasa) := InProgress 
enter 

If initialization is InProgress by some other thread, then wait. (According to 
the refined ‘initialized’ predicate, we do not start the class initialization, if it is 
already InProgress by the current thread). 

if tasAc is static block A init (currClasa) = InProgress then (Static) 

backup(self) := exec(self) 
enah\e(classRef (currClasa), self, Blocked) 
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If initialization is Done, then no further action is required; return immedi- 
ately. 

if tfflsA: is static block A mit(currClass) = Done then (Static) 

exit 

The case that the initialization is erroneous (see Sect.^H, remains the same 
modulo the refinement defined below for task phrase. 

To express the synchronization of the Class object, we strengthen the macro 
‘is’ by an additional condition, which guarantees (just for entering or exiting the 
class initializer) that the current thread is chosen to execute the monitor. 

task is phrase = task is phraseA 

(cfass7Je/(currClass) £ owns{self)\/ 
{locks{classRef{cmrC\ass)) £ {0, undef}A 
competing {classRef{cmrClass)) = self)) 

When executing endstatic, we have again to synchronize on the Class ob- 
ject. In every case (i.e., whether mode = undef or mode = Throw{ref)) we have 
to wake up all waiting threads and to reset their state. We add to the previous 
Endstatic rules of Sect.^Jand^Jthe following one: 

if tasA: is endstatic then (Endstatic) 

vary thread over waitset{classRef{cunClass)) 

awake(cZass7Je/(currClass), thread, backup (thread)) 

By closer inspection of the rules, one can observe that concurrent class initial- 
ization may deadlock: Assume that two threads T and S start the initialization 
of two different classes A and B] if during their respective initializations T en- 
counters a first active use of B, and similarity S encounters a first active use of A, 
both threads become Blocked and there is no way that either one of the threads 
becomes Runnable or Exiting again. This is in accordance with the LRM. 

6.3 Adding Local Working Memory 

In Javar class and instance fields, as well as dynamic type information are kept 
in main memory, shared by all threads. The main memory is represented by the 
glo, dyn and classOf functions. This model is not appropriate when Java runs on 
a shared-memory multiprocessor computer supporting local working memories. 

In the sequel we will discuss the necessary modifications to support local 
memories for instance fields. The adoption of the following strategy to support 
local working memories for class fields and the class Of function is described at 
the end of this section. 

Local working memories of threads, as described in the LRM, can be modeled 
by the dynamic function 

cache : Thread Reference x FieldSpec CacheValue 

storing values, which are tagged as either Used or Assigned by the thread; so 
the universe CacheValue is defined by: 
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CacheValue ::= Used(Value) \ Assigned(Value) 

The LRM defines rules when a thread is permitted or required to transfer 
data between main memory, where the master copies of variables are hold, and 
the local working memory, where working copies reside. The following strategy 
which guides the definition of the Java model Java-r (local) in this section is 
consistent with those rules: Every thread works as much as possible on its own 
working copy of a variable. When a thread assigns a variable, it uses its cache 
for storing it (in the sequel we abbreviate cache{self) by ‘cache’) so that we have 
to refine IFieldAss as follows: 

if task is {expi.fieldSpec = exp^) A val(expi) ^ null then (IFieldAss) 

cache{val{expi),fieldSpec) := Assigned{val{exp 2 )) 
val(task) := val{exp 2 ) 
proceed 

Variable access is slightly more complicated. The LRM requires that new 
variables are always allocated in main memory. Therefore, instance field access 
has to distinguish whether the variable is already cached or whether it must be 
transferred between main memory and the local working memory. 

if task is (exp.fieldSpec) A val(exp) ^ null then (IFieldAcc) 

if caAhe{val{exp),fieldSpec)= undef then 

caAhe{val{exp) jfieldSpec) := Used{dyn{val{exp),fieldSpec)) 
val(task) dyn{val{exp) ,fiddSpec) 
else val(task) get{caAhe{val{exp) ,fieldSpec)) 
proceed 

where get(c) =if c £ {Assigned(v), Used{v)} then v 

The synchronized statement allows reliable transmission of values from one 
thread to another through shared main memory. When we enter a synchronized 
block, we flush all variables (of the target reference of the synchronized state- 
ment) from the thread’s working memory. We model this effect by extending the 
application of the macro ‘lock’ in Synchronize as follows: 

lock( val(exp), 

vary / over instFields{classOf{val{exp))) 
cache {v al{ exp), f) := undef 
proceed) 

IFieldAcc guarantees that before using a variable the variable is either assigned 
or loaded from main memory. 

When we leave a synchronized block, the thread must copy all assigned 
values in its working memory back to main memory. To this purpose, we extend 
the application of the macro ‘unlock’ in Endsynchronize as follows: 

unlock) top{owns{self)) , 

■vary f over instFields{elassOf{top{owns{self)))) 
if cache{top(owns{self)) , f) = Assigned(res) then 
dyn{top{owns{self)),f) := res 
if mode = undef then proceed 
else abrupt) 
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In contrast to our model, the LRM (§ 17.6) requires to flush all variables 
from the thread’s working memory — this is overspecified. For best practice pro- 
grams our formulation is sufficient to guarantee reliable transmission of values 
between different threads. Furthermore Java’s LRM requirement would reduce 
the expected performance gain when using shared memory multiprocessors. 

Supporting local memories for class fields requires analogous modifications. 
It is even possible to support local versions of the classOf function. However, as 
long as garbage collection is not considered, local copies of the classOf function 
need not be retransmitted into main memory, since the classOf function is only 
assigned once. 

The LRM also formulates rules about the time delay between the transfer of 
variables from main memory into local working memory and vice versa. However 
for best practice programs neither these time delays nor the different memory 
models produce any semantical difference. 



7 Conclusion and Outlook 

In this work we have defined a rigorous abstract operational model which cap- 
tures faithfully the programmer’s view of Java as described in the reference 
manual ^3 . The model can be used for standardization purposes along the lines 
the ASM model for Prolog ^3 has been used for defining the ISO standard for 
the semantics of Prolog. For such an endeavor it is important that our math- 
ematical definition of the semantics of Java yields a complete model which is 
falsiflable by mental or machine experiments, in the sense of Popper and 
thus complements and enhances purely experimental studies of Java and its 
implementations (see for example the Kimera project 

Our definition provides a basis for a machine and system independent math- 
ematical analysis of the behavior of Java programs. As illustration we cite here 
some examples of theorems we can formulate and prove in rigorous mathemat- 
ical terms for our models of Java; we hope to publish these and related results 
at another occasion. 

Theorem 1. In the sequence Javaj, Javac,Javao, Javas, Javar each model is 
a conservative extension of its predecessor. 

This theorem strongly supports the semantics (not only syntax) based mod- 
ular approach we propose for the study of Java and its implementations. In 
particular it allows us to decompose the justification for the correctness of our 
Java model w.r.t. the LRM into the (routine) justification of the correctness of 
Javai followed by the separate justification of the orthogonal procedural, object- 
oriented, exception handling and concurrency features and their interaction. 

Theorem 2. Upon correct initialization the exception handling in Javasand 
Javaq-is precise. Each exception is either caught or propagates through the se- 
quence of method calls to the first statement with which the given program was 
started. 
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Theorem 3. The semantics of best practice programs in the same in the main 
memory model Java-r and in its refinement by local working memories 
Javar (local). 

Theorem 4. The static initialization in Javac , Javag and Javax runs is correct, 

i.e. it is done for each class (by exactly one thread) exactly at the first use of 
the relevant field modification, constructor or method invocation. Once a class is 
initialized, all its superclasses are initialized. 

The modular structure, and the relegation of standard compile-time matters 
to static functions, which support the comprehension of the model by humans 
and its use for proving interesting properties for Java programs, are two main 
features which distinguish our work from the approach of Wallace which is 
geared towards executability of the ASM specification. A comparison of these two 
models illustrates the high degree of freedom ASMs offer to tune a mathematical 
model to its intended use. 

We are working on refinements of our Java model to the level of abstraction 
of the Java Virtual machine These refinements take advantage of the mod- 
ular specification of orthogonal Java features as they appear in our models. The 
ASMs we are developing for the JVM provide the basis for a rigorous mathe- 
matical analysis of general compilation schemes of Java programs into JVM code 
including correctness proofs as developed for the implementation for Prolog on 
the WAM (see also and of Occam on the Transputer 

We are also working on applying our JVM models for safety analysis of Java 
byte code along the research approaches of Stati and Abadi Qian ^9 and 
Cohen 
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A ASM Rules for Java 



A.l Semantic Domains 

Domains for method execution 

Value ::= Bool \ Integers \ Floats \ Reference \ {null} 
Reason ::= Break(Lab) \ Continue{Lab) 

I Return \ Result(Value) \ Throw {Reference) 



taskr ■ Thread — > Phrase* 
valr ■ Thread — > {Exp —> Value)* 
locT : Threads ( Far — > Value)* 
moder ■ Thread — > Reason 
finallyr ■ Thread — > Reason* 



tasko = taskr{self) 
valo = valr{self) 
loco = locr{self) 
mode = moder{self) 
finally = finallyr {self) 



task = top{tasko) 

val = top{valo) 

loc = top{loco) 

frames = {tasko, valo, loco) 
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Domains for modeling class and instance variables 

glo : FieldSpec Value 
dyn : Reference x FieldSpec Value 
classOf : Reference —> Class 

Domains for modeling concurrency 

ThreadState ::= Runnable \ Blocked \ Notified \ Exiting 

exec, backup : Thread ThreadState 

stopped : Thread { Yes} 

owns : Thread Reference* 

locks : Reference —> Nat 

waitSet : Reference V Thread 

Domains for class initialization 

InitState ::= InProgress \ Done \ Error 

init : Class InitState 
initThread : Class Thread 

General Macros 

proceed = task := nxt(task) 
abrupt = task := up (task) 

task is phrase = top [taskr (self)) = phrase f\ stopped(self) = undefA 
exec{self) £ {Runnable, Notified, Exiting} 

initialized( c/oss) = init{class) — DoneV 

{initState{class) = InProgress A initThread(class) = self) 

A. 2 Javax: The Imperative Core 

if task is lit then (Literal) 

val(task) := lit 
proceed 

if task is (Qexp) then (UnaryExp) 

val(task) := Q{val{exp)) 
proceed 

if task is {exp\ ® exp2) A {0 G {/, %} val{exp2) / 0) then (BinaryExp) 

val(task) := val(expi)0val(exp2) 
proceed 
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if task is var then 
val(task) := loc(var) 
proceed 


(VarAcc) 


if task is {var = exp) then 
loc(var) := val(exp) 
val(task) := val(exp) 
proceed 


(Var Ass) 


if task is expi?exp2'- ea:p 3 :then 
if val(expi) then task := fst{exp2) 
else task ■.= fst{exps) 


(IfExp) 


if task is exp : then 

val{if {task)) := val{exp) 
task := nxt{if{task)) 


(ThenElseExp) 


if task is if {exp) stm\ else stm2 then 
if val{exp) then task := fst{stm\) 
else task := fst{stm2) 


(IfStm) 


if task is while {exp) stm then 
if val{exp) then task := fst{stm) 
else task := nxt{task) 


(While) 


if task is jump /a6;then 
mode := Jump{lab) 
abrupt 

for (jump, Jump) £ {(break, Break), (continue. Continue)} 


(Jump) 


if task is lab : stm then 
if mode = Jump{lab) then 
mode := undef 
task := jump 
else abrupt 

for {Jump, jump) £ {{Break, nxt{task)), {Continue, fst{stm))} 


(LabStm) 


Javac: Adding Classes 




if task is {elass, field) A initialized( cZass) then 
val{task) := glo {elass, field) 
proceed 


(CFieldAcc) 


if task is {{elass, field) = exp) A initialized( cZoss) then 
glo{elass, field) := val{exp) 
val{task) := val{exp) 

proceed 


(CFieldAss) 
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if task is {{class, method, fcty){exp\, . . . , exp„))/\ (CMethod) 

initialized( c/oss) then 

frames invoke((«;a/(ea:pi), . . . , val{expn)) , args , fst{body) , frames) 
where {args, body) = classMethod{class , method , fcty) 

invoke((«;a/i, . . . , vain), {van, . . . , van) , phrase, {tasks, vals. Iocs)) = 

{{phrase) tasks, {%) vals,{{{vari,vah), . . . ,{varn,valn)}) Iocs) 

if task is return exp\ then (Result) 

mode := Result{val{exp)) 
abrupt 

if task is finished A mode = Result {res) A length{tasko) > 1 then (Result’) 
mode := undef 
frames result (res, frames) 

if task is return; then (Return) 

mode := Return 
abrupt 

if task is finished A mode = Return A length{tasko) > 1 then (Return’) 

mode := undef 
frames := return(frames) 

result(res, ((_, mr) tasks , {^, val) vals,{_) Iocs)) = 

{{nxt{inv)) tasks , {val (B {{inv , res)}) vals. Iocs) 

return((_, mr) tasks, {f) vals,{_) Iocs) = 

{{nxt{inv)) tasks , vals , Iocs) 



A. 4 Javao: Adding Objects 



if task is new class A initialized) c/oss) then (Newinstance) 

newinstance) c/ass , 
val{task) := ref) 
proceed 

newInstance(c/oss, updates) = extend Reference by ref 

classOf{ref) := class 
vary / over instFields {class) 
dyn{ref,f) := instFieldValue{f) 
updates 

if task is new constrSpec{exp\, . . . , expn) then (Constr) 

frames invoke((t/iis) vals , {this) args, fst {body), frames) 
where {args, body) = instConstr{constrSpec) 

this = if new = new class then val{new) else Zoc(this) 
vals = (val{expr), . . . , val{expn)) 
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if task is this then 
val(task) := /oc(this) 
proceed 

if task is (exp.fieldSpec) A val(exp) ^ null then 
val(task) := dyn{val{exp) ,fieldSpec) 
proceed 

if task is {expi.fiddSpec = exp2) A val(expi) null then 
dyn{val{exp\) , fieldSpec) := val{exp2) 
val(task) := val{exp2) 
proceed 

if task is {exp.methodSpec{kind}{expi, . . . , exp„))A 
val(exp) 7^ null then 

frames invoke((ta/(ea:p)) vals , (this) args,fst (body), frames) 
where (args, body) = instMethod(methodSpec, class, kind) 
vals = (val{expr), . . . , val{exp„)) 
class = case kind of Nonvirtual : currClass 

Virtual : classOf{val(cxp)) 

Super : SMper(currClass) 

if task is {exp instanceof class) then 
val(task) := val(exp) ^ nullA 

compatible ( classOf{val ( exp)) , class) 

proceed 

if task is (class) exp A 

val(exp) = null V compatible{classOf{val{exp)), class) then 

val(task) := val(exp) 

proceed 



A. 5 Java^: Adding Exceptions 



if task is throw exp; then 
if val(exp) ^ null then 

mode := Throw{val(exp)) 
abrupt 

else fail(NullPointerException) 



fail(cZass) = newlnstance( cZass, mode := Throw(ref) 

abrupt) 



(This) 



(IFieldAcc) 



(IFieldAss) 



(IMethod) 



(Instanceof) 



(Cast) 



(Throw) 
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if task is ( catch (co vo) bo . . . catch (c„ d„) bn) then (Catch) 

if mode = Throw(exc) A3 i : 0 < i < n : catches (d) then 
loc(vk) ’■= exc 
mode := undef 
task := fst(bk) 

where k = U :Q < i < n : catches{ci)A 

Vj : 0 < j < i : -I catches (cj) 

else abrupt 

where catches(class) = compatible{classOf{exc), class) 

if task is finishedA mode = Throw{exc) A length(tasko) > 1 then (Throw’) 
frames := throw(frames) 

throw((_, int) tasks, (3) vals,{3) Iocs) = {{up{inv)) tasks , vals , Iocs) 

if tosfc is finally block then (Finally) 

finally (mode) finally 
mode := undef 
task := fst(block) 

if tasA: is endf inally then (Endfinally) 

finally finally' 

if mode = undef A mode' = undef then proceed 
elseif mode = undef A mode' yf undef then mode := mode' 

abrupt 
else abrupt 

where {mode') finally' = finally 

if task is {expi 0 exp 2 ) A (8) € {/, %} A val{exp 2 ) = 0 then (BinaryExp) 

fail(ArithmeticException) 

if {task is exp.fieldspecW task is exp .fieldspec = exp'iS (ITarget) 

task is exp.methodspec{kind}{expi, . . . , exp„))A 
val{exp) = null then 
fail(NullPointerException) 

if task is {type)expA (Cast) 

val{exp) 7 ^ null A -icompatible{classOf{val{exp)), class) then 
fail(ClassCastException) 

A. 6 Java^: Adding Threads 

if task is ea:p.start() A val{exp) yf null then (Start) 

if exec{val{exp)) = undef then 

newFrames := sta,rt{fst{body)) 
exec{val{exp)) := Runnable 
proceed 

else fail(lllegalThreadStateException) 

where ((), body) = instMethod{run, classOf{val{exp)), Virtual) 

newFrames = {taskr{val{exp)) , valr{val{exp)) , locr{val{exp))) 
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start(p/irase) = {{phrase), (0), (0)) 

if task is finished A length{tasko) = 1 A exec{self) — Runnable then (Terminate) 
exec{self) := Exiting 

if tosfc is synchronized (exp) 6/ocA: then (Synchronize) 

if val{exp) yf null then 

lock{val{exp),task := fst{block)) 
else fail(NullPointerException) 

if tosA: is endsynchronizedthen (Endsynchronize) 

unlock( top{owns{self)) , 

if mode = undef then proceed 
else abrupt) 

lock(re/, updates) = 

if ref £ owns{self) V {locks{ref) £ {0, undef} A competing {ref) = self) then 
owns{self) := {ref) owns{self) 
locks{ref) := locks{ref) + 1 
updates 

unlock(re/, updates) = 

owns{self) := pop{owns{self)) 
locks{ref) := locks{ref) — 1 
updates 

if task is {exp. wait ()) A val{exp) 7 ^ null then (Wait) 

case exec{self) of 
Runnable, 

Exiting : if val{exp) £ owns{self) then 
backup{self) := exec{self) 
enahle{val{exp) , self , Blocked) 
else fail(lllegalMonitorStateException) 

Notified : reenable(ta/(ea:p), se//, 6acA:Mp(se(/’), proceed) 

if task is {exp. notify ()) A val{exp) yf null then (Notify) 

if val{exp) £ owns{self) then 

if waits et{val{exp)) yf 0 then 

awake(ta/(ea:p), choosenoUSy{waitSet{val{exp))) , Notified) 
proceed 

else fail(l llegalMonit orSt ateExcept ion) 
enable(re/, thread, state) = waitSet{ref) := waitSet{ref) U {thread} 

locks {ref) := 0 
exec{thread) := state 

awake(re/, thread, state) = waitSet{ref) := waitSet{ref) — {thread} 

exec{thread) := state 

reenable(re/, thread, 

state, updates) = if locks{ref) £ {0, undef} A competing {ref) = thread then 
locks{ref) := occs {owns {thread), ref) 
exec{thread) := state 
updates 
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if task is ea:p.stop() A val(exp) ^ null then (Stop) 

if exec(val{exp)) 7^ Exiting then 
stopped{val{exp)) := Yes 
proceed 

if stopped(self) then (Stopped) 

case exec{self) of 
Runnable : stop 

Bloeked : a,wake{val{exp),self, Notified) 

Notified : reenah\e{val(exp),self, Exiting,atop) 
where exp. wait () = task 

stop = stopped(self) := undef 
exee(self) := Exiting 
fail(ThreadDeath) 

A. 7 Initialization 

if {task is {elass , field) V task is {elass, field) = expY (FirstActiveUse) 

task is {{elass, method, fety){exp\, . . . , exp„)\t 
task is new c/oss)) A -iinitialized(c/oss) then 
frames := initialize) c/oss, frames) 



initialize) c/oss, /rames) = invoke))), {), fst{elasslnit{elass)) , frames) 



if iosA: is static bloek then (Static) 

case m//(currClass) of 
undef : init Thread {elass) := self 

mii (currClass) := InProgress 
enter 

InProgress : baekup{self) := exee{self) 

enable) c/oss7Je/(currClass), self, Bloeked) 

Done : exit 

Error : fail(NoClassDefFoundError) 

enter = if stipers (currClass) 7^ 0 A ^initialized(sMper(currClass)) then 
frames := initialize(sMper)currClass), frames) 
else task := fst {block) 
exit = if invoker = static bloek then 
frames := return(frames) 
else frames := goBack(frames) 
where (_, invoker) _ = tasko 
goBack((_) tasks, (f) vals,{f) Iocs) = {tasks , vals , Iocs) 

currClass = classScope{task) 
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task is phrase = task is phraseA 

{classRef{cnrrClass) G owns{self)\t 
{locks{classRef{cvLrrClass)) G {0, undef}A 
competing {classRef{currClass)) = self)) 



if tosA: is endstaticthen 

vary thread over waitset{classRef{currClass)) 

awake(c/oss-Re/(currClass), thread, backup (thread)) 
if mode = undef then 
mit(currClass) Done 
exit 
else 

mit(currClass) Error 
if compatible(classOf (ref), Error) then 
frames throw(frames) 
else fail(Except ionininit ializerError) 
where Throw(ref) = mode 



(Endstatic) 
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